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Abstract 


There  are  multiple  strategies  for  answering  questions.  For  example,  a  statement  is 
sometimes  verified  using  a  plausibility  process,  and  sometimes  by  using  a  direct-retrieval 
process.  It  is  claimed  that  there  is  a  distinct  strategy-selection  phase  and  a  framework  is 
proposed  to  account  for  strategy-selection.  Six  experiments  support  the  assumptions  of 
the  proposed  framework:  The  first  three  experiments  show  that  strategy-selection  is  under 
the  strategic  control  of  the  subjects.  These  experiments  also  indicate  what  contextual 
variables  affect  this  selection.  Experiments  4  and  5  suggest  that  strategy  selection  also 
involves  evaluating  the  question  itself,  while  Experiment  6  suggests  variables  that 
influence  the  evaluation  of  the  question.  This  model  is  shown  to  be  consistent  with 
processing  strategies  in  domains  other  than  question-answering,  viz.,  dual-task  monitoring 
in  divided-attention  situations. 


In  Norman's  paper  Memory,  Knowledge,  and  the  Answering  of  Questions  (1973),  he  points 
out  that  the  process  of  question-answering  is  far  from  simple  and  that  the  "traditional 
psychological  studies  of  memory"  do  not  tell  us  about  the  way  that  knowledge  is  used 
to  answer  questions.  There  has  been  considerable  effort  devoted  to  understanding 
memory  and  memory  retrieval,  as  it  relates  to  recognition  and  recall  tests.  There  are 
formal  theories  of  how  to  find  specific  facts  in  memory  (eg.,  Anderson.  1972:  1976: 
Atkinson  &  Shiffrin,  1968;  Bower,  1972:  Kintsch.  1970;  Raaijmakers  &  Shiffrin,  1981; 
Ratcliff  &  Murdock,  1976).  There  has  also  been  work  on  other  ways  of  answering 
questions,  such  as  searching  one's  autobiographical  memory  (eg..  Reiser,  Black  & 
Abelson.  1985,  Whitten  &  Leonard,  1981;  Williams  &  Hollan,  1981:  Williams  &  Santos- 
Williams,  1980)  and  making  plausibility  judgments  (eg.,  Collins,  I978a,b).  However, 
there  has  been  little  work  on  whether  people  select  strategies,  and  if  so.  how  people 
decide  which  strategy  or  process  to  u9e  in  order  to  answer  a  question. 

Many  question-answering  models  of  memory  include  a  preliminary  stage  where  the 
subject  performs  an  evaluation  of  the  query  to  see  if  a  quick  decision  can  be  made  or 
if  more  work  is  required.  For  example,  Norman  (1973)  pointed  out  that  people  do  not 
search  memory  for  the  answer  to  the  question,  "What  is  Charles  Dickens  telephone 
number?".  Rather,  some  initial  pre-processing  allows  us  to  decide  that  further  search 
would  be  fruitless.  A  key  assertion  in  this  paper  is  that  people's  flexibility  in  their 
control  of  memory  retrieval  goes  far  beyond  simply  a  decision  whether  to  "fast  exit." 
People  have  multiple  strategies  for  retrieving  information  and  choose  to  apply  these 
strategies  in  variable  orders.  In  a  theory  where  there  is  variable  strategy  selection,  it 
seems  reasonable  to  propose  that  people  s  initial  evaluation  of  their  memory,  with 
respect  to  the  question,  affects  that  strategy  selection  There  have  been  a  number  of 
endeavors  concerned  with  self-assessment  of  knowledge,  (eg..  Gentner  &  Collins.  1 98 1 
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Hart,  1965;  Lachman  &  Lachman,  1980;  Norman,  1973;  Nelson,  Gerler  &  Narens  1984) 
although  this  work  has  tended  not  to  address  strategy  selection. 


This  paper  presents  a  general  framework  for  the  process  of  answering  questions 
from  memory.  The  paper  focuses  primarily  on  verification  and  recognition  tasks,  but  not 
exclusively,  and  the  framework  is  shown  to  generalize  to  other  types  of  question¬ 
answering  situations.  There  are  a  number  of  assumptions  to  the  model  proposed  here. 
Before  reviewing  evidence  for  some  of  the  claims  and  presenting  new  data  in  support  of 
additional  hypotheses,  it  would  be  useful  to  highlight  the  critical  ideas: 


1.  There  are  multiple  strategies  for  question-answering. 

2.  One  strategy  is  to  try  to  find  a  fact  which  encodes  the  answer  in  memory. 
This  is  the  direct  retrieval  strategy. 

3.  Another  strategy  is  to  compute  a  plausible  answer  given  a  set  of  facts  stored 
in  memory.  This  is  the  plausibility  strategy. 

4.  Before  answering  a  question,  a  person  engages  in  an  initial  strategy-selection 
phase  to  decide  which  strategy  or  sequence  of  strategies  to  use. 

5.  The  strategy-selection  stage  consists  of  an  initial  evaluation  of  knowledge 
relevant  to  the  question  followed  by  a  decision  of  which  strategy  to  follow. 
The  initial  evaluation  is  an  automated  process  while  the  decision  is  a 
controlled  process. 

6.  In  the  initial  evaluation,  a  person  assesses  how  familiar  the  words  in  the 
question  are.  The  more  familiar  the  words,  the  more  the  person  is  biased  to 
direct  retrieval. 

7.  In  the  initial  evaluation,  the  person  also  assesses  how  many  intersections  in 
memory  there  are  among  the  words  from  the  question.  The  more 
intersections,  the  more  the  person  is  biased  towards  plausibility. 

8.  The  strategy  decision  process  integrates  information  from  the  initial  evaluation 
with  factors  extrinsic  to  the  question  in  order  to  select  a  strategy.  Extrinsic 
influences  include  instructions  and  probability  that  a  particular  strategy  will  be 
successful. 


i 


After  reviewing  the  evidence  that  strategy  selection  (or  bias)  is  involved  whenever  a 
question  is  answered,  the  paper  will  go  on  to  suggest  the  mechanisms  that  people  use 
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for  deciding  quickly  which  strategy  to  apply.  Understanding  these  mechanisms  and  the 
variables  that  influence  them  is  the  primary  focus  of  this  paper. 

Arguments  in  Support  of  Strategy  Selection 
Searching  Memory  for  a  Specific  Fact  Is  Not  Always  Preferred 

The  default  assumption  of  memory  theorists  has  tended  to  be  that  when  verification 
of  a  proposition  is  required,  a  careful  search  for  a  specific  proposition  is  the  first 
I  strategy  tried  (at  least  after  an  initial  evaluation).1  The  reason  that  direct  retrieval  is 

assumed  to  be  tried  first  is  that  it  is  also  commonly  believed  that  direct  matching  is  a 
more  efficient  process  than  inferential  reasoning,  (eg.,  Anderson,  1976;  Anderson  & 
Bower,  1973;  Camp,  Collins  &  Loftus,  1975;  Collins  &  Quillian.  1969;  Maviland  &  Clark, 
1974;  Kintsch,  1974;  Lachman,  1973;  Lachman  &  Lachman.  1980;  Lehnert,  1977; 
Norman,  Rumelhart  &  the  LNR  Research  Group.  1975;  Quillian,  1968;  Schank  &  Abelson, 
1977).  Lachman  and  Lachman  (1980)  articulate  this  commonly  held  conception  of  the 
preference  for  one  strategy  over  another: 

When  a  person  needs  a  particular  piece  of  information-e.g.,  to  answer  a 
question-she  attempts  to  retrieve  it  directly.  Metamemorial  processes  return  the 
information  that  an  answer  is  or  is  not  in  store.  If  an  answer  is  found, 
metamemorial  control  processes  are  involved  in  assessing  its  adequacy.  If  no 
answer,  or  an  inadequate  answer,  is  retrieved,  then  the  process  of  inference  is 
set  into  motion,  (pp.  289-290) 

There  is  now  a  growing  body  of  literature  suggesting  that  searching  memory  for  an 
exact  match  is  not  always  done  even  in  tasks  that  require  such  careful  inspections  (e  g.. 
Erikson  &  Matson,  1981;  Reder.  1982;  Reder  &  Ross,  1983;  Reder  &  Wible.  1984). 
Erikson  and  Mattson  asked  subjects  questions  like  "How  many  animals  of  each  kind  did 
Moses  take  on  the  Ark?".  Subjects  almost  uniformly  reply  "two"  even  though  they  knew 
that  Noah  took  the  animals  on  the  ark.  It  seems  in  this  case  that  people  do  not  bother 
to  carefully  inspect  their  memories  for  exact  matches  to  the  memory  probe.  Their  data 


Reder 


6 


can  not  be  explained  by  assuming  that  subjects  accessed  the  correct  memory  trace, 
noted  the  discrepancy,  but  then  obligingly  gave  the  intended  answer.  If  that  were  true, 
then  subjects  should  not  find  it  difficult  to  verbally  note  the  discrepancy  when  specifically 
instructed  to  do  so.  In  fact  subjects  have  a  great  deal  of  difficulty  with  such  a  task: 
Reder  and  Dennler  (in  preparation)  constructed  a  large  number  of  these  trick  questions 
and  told  half  the  subjects  to  give  the  answer  only  when  the  question  was  presented  in 
its  correct  form  (i.e.,  answer  when  the  question  uses  'Noah',  but  say  "can't  say"  when 
the  question  uses  Moses  )  and  told  the  other  half  of  the  subjects  to  give  an  answer 
based  on  the  "gist"  of  the  question  (i.e.,  regardless  of  whether  or  not  the  question  used 
"Moses").  Subjects  were  significantly  faster  and  more  accurate  in  the  condition  where 
they  could  ignore  whether  the  question  was  properly  formed  or  not.  Subjects  found  it 
very  difficult  to  say  "can’t  say."  It  seems  then  that  question-answering  often  proceeds 
by  loose  inspection  of  the  data-base  rather  than  by  searching  for  one  specific 
proposition. 

The  robustness  of  the  "Moses  illusion"  suggests  that  subjects  rarely  prefer  a 
strategy  that  involves  a  careful  match  to  memory  of  one  specific  fact;  however,  other 
data  suggest  the  opposite.  For  example,  Singer  and  Ferreira  (1983)  found  that  subjects 
were  faster  to  answer  questions  that  involved  an  exact  restatement  of  a  sentence  read 
in  a  story  or  that  involved  inferences  likely  to  be  drawn  during  reading  than  they  were  to 
answer  sentence  paraphrases  or  inferences  not  required  for  story  comprehension.  The 
evidence  that  subjects  tend  to  verify  statements  by  searching  for  an  exact  match  comes 
from  experiments  involving  short  delays  between  study  and  test,  in  situations  where  the 
delays  are  longer,  there  is  evidence  that  strategy-preference  changes  from  searching  for 
exact  matches  to  using  a  plausibility  strategy  (Reder.  1982;  Reder  &  Wibie.  1984). 
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Strategy-Preference  Is  Not  Stable  for  the  Same  Questions  in  the 
Same  Task 

In  Reder  (1982),  subjects  answered  questions  based  on  short  stories  they  read. 
One  group  of  subjects  was  required  to  decide  whether  a  particular  sentence  had  been 
studied,  while  the  other  group  was  to  judge  whether  a  particular  statement  was  plausible 
given  the  story  read.  (See  Table  1  for  an  example  story.)  Half  of  the  plausible  test 
probes  had  been  presented  in  the  story  (a  different,  random  set  for  each  subject). 
Although  it  might  seem  reasonable  that  the  verbatim  or  direct-retrieval  strategy  would  be 
used  exclusively  for  the  recognition  task  and  the  plausibility  strategy  for  the  plausibility 
task,  the  data  indicated  that  subjects  often  tried  first  the  strategy  that  corresponds  to 
the  other  task.  At  short  delays  between  reading  of  the  story  and  test,  subjects  in  both 
groups  tended  to  prefer  the  direct-retrieval  (or  verbatim-match)  strategy,  while  at  longer 
delays,  both  groups  tended  to  prefer  the  plausibility  strategy. 

insert  Table  1  about  here 


The  evidence  for  use  of  both  strategies  in  both  tasks  comes  from  the  pattern  of 
latencies  and  errors  with  respect  to  the  plausibility  of  the  test  items.  Items  differed  in 
their  degree  of  plausibility,  half  highly  plausible,  half  moderately  plausible.  Not 
surprisingly,  it  takes  subjects  longer  to  decide  that  a  moderately  plausible  statement  is 
plausible  and  they  less  consistently  judge  moderately  plausible  statements  to  be  plausible 
than  highly  plausible.  However,  these  plausibility  effects  obtain  even  when  subjects  are 
asked  to  recognize  whether  or  not  a  fact  had  been  stated  in  the  story:  moderately 
plausible  statements  were  recognized  more  slowly  than  highly  plausible;  also  moderately 
plausible  statements  were  recognized  less  often  than  highly  plausible  statements, 
regardless  of  whether  or  not  they  had  actually  been  presented  (causing  recognition 
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accuracy  to  be  better  for  highly  plausible  presented  statements  and  worse  for  highly 
plausible  not-presented  statements).  These  results  are  taken  as  evidence  that  sometimes 
statements  are  "recognized"  by  determining  that  they  are  plausible. 

There  is  also  evidence  of  use  of  the  direct-retrieval  strategy  in  the  plausibility  task. 
Plausible  statements  that  had  not  appeared  in  the  story  were  judged  implaus.ole  more 
often  than  when  they  had  appeared  in  the  story.  Also,  the  size  of  the  plausibility  effects 
(difference  in  RT  between  moderately  and  highly  plausible  statements)  was  larger  for 
probes  that  had  not  appeared  in  the  story,  and  therefore  could  not  have  been  verified 
by  the  direct-retrieval  strategy, 

For  those  statements  that  had  appeared  in  the  story,  there  was  less  use  of  the 
direct-retrieval  strategy  at  a  delay.  This  evidence  comes  from  the  change  in  the  size  of 
plausibility  effects  with  delay.  Differences  in  RT  between  moderately  and  highly  plausible 
statements  increased  with  delay  for  presented  statements.  This  suggests  that  people 
change  strategy  preference  from  direct-retrieval  at  short  delays  to  the  plausibility  strategy 
at  longer  delays. 

The  increase  in  error  rates  in  the  recognition  task  for  not-stated  items,  especially 
for  highly  plausible  statements,  also  indicates  a  shift  in  preference  for  the  plausibility 
strategy  with  longer  delays:  At  longer  delays,  subjects  tend  to  prefer  the  plausibility 
strategy  even  in  the  recognition  task.  Highly  plausible  statements  will  most  likely  be 
judged  plausible,  causing  an  error  for  those  statements  that  had  not  been  presented  in 
the  story. 

Converging  Results  that  Support  a  Strategy-Selection  Stage 

The  interpretation  that  strategy  choice  varies  with  the  delay  between  reading  the 
story  and  test  suggests  that  people  have  a  mechanism  that  allows  them  to  select  a 
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strategy  for  question-answering  prior  to  executing  that  strategy.  Below  I  review  additional 
data  that  support  a  preliminary  strategy-selection  phase.  In  addition,  new  experiments  will 
be  reported  that  further  strengthen  the  case  for  an  initial  selection  phase. 

By  assuming  that  people  are  able  to  select  the  strategy  that  they  use  in  tasks 
such  as  question  answering,  a  number  of  results  are  more  easily  interpreted.  For 
example,  an  unusual  finding  from  Reder  (1982)  was  that  subjects  asked  to  make  a 
plausibility  judgment  were  very  slow  to  judge  not-stated  items  as  plausible,  but  only  when 
the  delay  between  reading  and  test  was  short.  As  the  delay  increased,  subjects  actually 
became  faster  than  they  were  at  shorter  delays  to  judge  not-stated  items  as  plausible. 

The  explanation  given  in  Reder  (1982)  was  that  at  short  delays,  the  wrong  strategy 
is  tried  first.  The  flowchart  model  displayed  in  Figure  1  represents  the  probabilistic 
model  offered  in  that  paper.  It  represents  the  branching  alternatives  associated  with 
judging  an  assertion,  regardless  of  whether  the  person  was  asked  to  make  a  plausibility 
judgment  or  a  recognition  judgment.  Each  branch  (reflecting  a  choice  path)  has  a 
probability  associated  with  it  and  was  affected  by  variables  such  as  the  delay  between 
reading  the  story  and  test,  the  official  task  requirements  or  the  plausibility  of  the  test 
probe 


Insert  Figure  1  about  here 


The  speed-up  for  nnt-presented  plausible  statements  was  explained  by  assuming 
that  at  short  delays  subjects  tried  the  direct  retrieval  strategy  first  (right  branch  of  tree) 
and  when  it  failed,  they  went  on  to  compute  the  statement  s  plausibility:  at  longer 
delays,  they  tried  plausibility  without  first  executing  the  useless  strategy  (left  branch). 


thereby  saving  time. 
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Put  another  way,  the  claim  is  that  people  sometimes  adopt  as  a  first  strategy  one 
that  is  inappropriate  for  certain  conditions.  This  can  also  explain  a  speed-up  found  in 
the  data  of  Reder  and  Wible  (1984).  That  experiment  required  subjects  to  make  either 
recognition  judgments  or  consistency  judgments  about  statements  after  having  studied 
groups  of  thematically  related  facts.  Judging  consistency  meant  deciding  whether  a 
probe  was  thematically  similar  to  a  studied  statement.  At  short  delays,  subjects  were  very 
slow  to  verify  not-stated,  consistent  items  in  a  consistency  judgment  task.  At  longer 
delays,  they  became  faster  and  more  accurate  to  verify  this  type  of  item,  while  they 
became  much  slower  to  reject  those  items  in  a  recognition  task.  Again,  the  explanation 
was  that  subjects  preferred  the  direct  retrieval  strategy  at  short  delays--an  inappropriate 
strategy  for  not-stated.  consistent  items  in  the  consistency  judgment  task.  At  longer 
delays,  the  direct  retrieval  strategy  was  often  avoided,  saving  time  for  not-stated  items  in 
the  consistency  judgment  task,  but  causing  errors  or  slow  responses  for  not-stated 
consistent  items  in  the  recognition  task. 

Other  factors  that  influence  strategy  selection 

The  tendency  to  prefer  the  direct  retrieval  strategy  or  the  consistency  strategy  was 
not  just  affected  by  delay  between  study  and  test.  It  was  also  influenced  by  whether 
the  official  task  was  to  make  recognition  judgments  or  consistency  judgments.  Even  in 
situations  where  there  are  not  explicit  instructions,  strategy-selection  can  be  affected  by 
impressions  or  expectations.  For  instance,  Gould  and  Stephenson  (1967)  found  that 
subjects'  willingness  to  engage  in  reconstructive  recall  was  affected  by  their  perception 
of  the  emphasis  on  verbatim  recall. 

Reder  and  Ross  (1983)  have  shown  that  even  within  a  recognition  task,  strategy 
preference  can  depend  on  the  relation  between  the  true  and  false  assertions  to  be 
discriminated,  i.e ..  subjects  adjust  their  strategy-selection  to  reflect  the  difficulty  of  the 
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discrimination.  In  Reder  and  Ross,  subjects  studied  related  sets  of  facts  about  fictitious 
individuals,  (eg.,  Marty  going  to  the  circus),  and  had  to  discriminate  studied  sentences 
from  non-studied  foils.  When  these  foils  were  thematically  related  to  the  studied  facts 
(e.g.,  also  about  Marty  at  the  circus),  subjects  tended  to  adopt  the  direct-retrieval 
strategy.  When  foils  were  unrelated  to  the  circus  theme,  subjects  tended  to  base  their 
"recognition  judgments"  on  a  plausibility  strategy. 

In  a  similar  vein,  Lorch  (1981)  showed  that  in  a  category-membership  task,  when 
unrelated  items  were  used  as  foils,  subjects  tended  to  adopt  a  strategy  that  seemed 
consistent  with  the  semantic  overlap  model  of  Smith  et  al.  When  foils  consisted  of  highly 
related  and  somewhat  related  terms,  but  no  unrelated  terms,  subjects  tended  to  adopt  a 
strategy  of  careful  evaluation  of  subject-predicate  relations. 

It  is  worth  noting  that  I  am  deliberately  vague  about  the  consequences  of  strategy 
selection  on  the  nature  of  the  deployment  of  the  competing  strategies,  e  g.,  does  the 
preferred  strategy  execute  first,  by  itself?  In  Reder  (1982),  I  presented  a  model  in 
which  subjects  sequentially  tried  one  strategy  and  then  another;  however,  it  was  noted 
there  that  an  equivalent  model  would  involve  a  (parallel)  race  between  the  two  strategies 
where  subjects  biased  the  amount  of  cognitive  capacity  given  to  each  strategy.  The 
data  reported  here  do  not  discriminate  between  these  two  conceptions.  The  claim  is 
more  abstract;  people  differentially  deploy  resources  (by  parallel  or  serial  execution)  to 
strategies  as  a  function  of  an  initial  strategy  selection  decision. 

New  Evidence  for  Controlled  Strategy  Selection: 

The  Role  of  Situational  Variables 

The  data  reviewed  thus  far  seem  easily  explained  if  one  assumes  that  people  have 
the  ability  to  decide  how  they  want  to  go  about  answering  questions.  This  assumption 
leaves  open  a  number  of  questions.  For  example,  does  a  person  decide  each  time  he 
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or  she  attempts  a  question  which  strategy  will  be  preferred?  Or  do  people  select  a 
preferred  strategy  to  use  throughout  a  task  based  on  knowledge  of  instructions  and 
similarity  of  foils  to  targets?  How  sensitive  are  people  to  the  success  rate  of  a 
particular  strategy?  Can  we  quickly  adjust  strategy  preference  based  on  subtle  features 
such  as  success  with  a  strategy? 

This  section  describes  experiments  that  are  concerned  with  the  extent  to  which 
people  can  fine-tune  their  control  over  what  strategy  they  use  for  question-answering.  It 
is  also  concerned  with  uncovering  what  factors  extrinsic  to  the  question  affect  strategy- 
selection.  These  experiments  also  provide  further  support  that  people  can  and  do  select 
a  strategy  prior  to  executing  one  for  question-answering. 

Experiment  1:  CAN  WE  ADJUST  OUR  STRATEGY  PREFERENCE  TO 
MIRROR  THE  RATIO  OF  PRESENTED  TO  NON-PRESENTED  STATEMENTS 
IN  A  STORY? 

To  what  extent  can  people  discern  the  effectiveness  of  a  particular  strategy  and 
adjust  the  tendency  to  select  a  strategy  on  the  basis  of  its  perceived  effectiveness?  To 
address  this  question,  subjects  read  short,  mildly  interesting  stories  and  then  were  asked 
questions  about  them.  After  each  story,  subjects  were  asked  to  judge  whether  the  test 
probe  was  plausible,  given  the  story.  Unlike  previous  experiments  of  this  type,  the 
percentage  of  explicitly  presented  inferences  varied  from  the  usual  50%.  For  half  of  the 
subjects,  80%  of  the  plausible  statements  were  presented  in  the  stories,  and  for  the 
other  half  of  the  subjects  only  20%  were  presented.  Of  interest  were  whether  the 
propensity  to  use  a  strategy  was  affected  by  the  different  ratios  of  presented  to  not- 
presented  sentences,  and  how  quickly  subjects  adjusted  their  strategies  to  adapt  to 


these  ratios. 
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METHOD 

Materials.  Ten  stories  written  by  five  different  authors  were  used.  The  questions 
about  the  stories  and  the  stories  themselves  had  been  used  previously  (Reder, 
1976;  1979;  1982).  The  questions  were  of  three  types:  highly  plausible,  moderately 
plausible,  and  implausible.  The  plausibility  of  plausible  statements  had  been  determined 
by  previous  subject  ratings.  Half  of  the  implausible  statements  were  contradictions. 
(Contradictions  had  not  been  used  in  the  previous  studies.)  Each  contradictory 
statement  was  an  exact  restatement  of  a  statement  from  the  story  except  that  one  word 
was  replaced  by  its  opposite  Contradictory  statements  were  considered  "presented" 
implausibles.  The  implausible  and  contradictory  statements  did  not  vary  systematically  on 
implausibility.  They  were  constructed  by  the  experimenter  with  the  constraint  that  non¬ 
contradictions  not  refer  to  any  specific  statement  in  the  story 2  Table  1  gives  an 
example  story  with  implausible  and  contradictory  statements. 

Design  and  Procedure.  There  were  four  factors  in  this  experiment:  whether  the 
ratio  of  presented  statements  to  non-presented  statements  favored  the  direct-retrieval 
strategy  or  the  plausibility  strategy,  whether  the  statement  itself  was  highly  or  moderately 
plausible,  whether  the  statement  had  actually  been  presented  in  the  story  and  whether 
the  bias  was  still  present.  This  last  factor  was  manipulated  by  the  following  design  of 
the  materials:  The  first  six  of  the  ten  stories  had  either  the  80/20  split  or  the  20/80  split 
of  presented  to  not-presented  plausible  statements;  however,  after  the  first  six  stories, 
the  remaining  four  stories  always  returned  to  the  more  conventional  50/50  split. 

The  experiment  was  conducted  on  an  IBM  personal  computer.  Subjects  read  ten 
stories  which  were  titled  and  presented  in  random  order.  For  each  story,  subjects 
controlled  the  rate  at  which  statements  appeared  on  the  screen:  each  time  the  space 
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bar  was  pressed,  a  new  statement  appeared,  so  long  as  the  previous  statement  had 
been  on  the  screen  for  a  minimum  of  0.5  seconds.  Subjects  were  tested  after  reading 
each  story,  Subjects  were  asked  to  decide  whether  or  not  each  statement  was 
plausible,  i.e,,  consistent  with  the  information  in  the  story  just  read.  Subjects  were  told 
to  indicate  that  a  statement  was  plausible  or  implausible  by  pressing  the  "K"  or  the  "D" 
key,  respectively.  They  were  further  instructed  to  keep  their  index  fingers  on  these  keys 
at  all  times  while  judging  the  statements,  since  response  times  would  be  recorded,  and 
to  respond  as  quickly  as  possible,  without  sacrificing  accuracy. 

Subjects.  College-age  subjects  who  read  ads  on  the  Carnegie-Mellon  and 
University  of  Pittsburgh  campuses  were  recruited.  Thirty  subjects  were  randomly 
assigned  to  the  Inference-bias  Condition  (i.e.,  only  20%  of  the  plausibles  were  stated  in 
the  stories  for  the  first  six  stories)  and  29  to  the  Direct-retrieval-bias  Condition  (i.e.,  80% 
of  the  plausible  probes  had  been  presented  in  the  stories). 

RESULTS 

Below  in  Table  2  are  listed  the  mean  response  times  to  make  plausibility  judgments  as 
a  function  of  four  factors:  the  plausibility  of  the  statement,  whether  the  statement  had 
been  presented  or  not,  the  direction  of  bias  (80%  presented-biassing  direct  retrieval 
versus  80%  not-presented--biassing  plausibility),  and  whether  the  ratio  of  presented-to-not- 
presented  had  reverted  to  50:50.  Medians  of  correct  response  times  in  each  condition 
for  each  subject  were  computed.  The  times  presented  here  represent  the  mean  of 
these  medians.  If  a  subject  had  no  correct  response  times  in  a  condition,  the  cell  was 
estimated  by  taking  the  grand  mean  and  subtracting  from  it  both  the  effect  size  for  that 
condition  and  the  effect  size  for  that  subject.  Less  than  .003  of  the  cells  needed  to  be 
estimated  in  that  fashion.  Figure  2  presents  the  difference  between  moderately  and 
highly  plausible  response  times  as  a  function  of  biassing  condition,  collapsed  over  the 
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stated/not-stated  factor.  It  is  plotted  for  stories  1-6,  where  bias  existed,  and  stories  7-10. 
where  the  ratio  reverted  to  50:50.  Figure  3  presents  the  difference  between  stated  and 
not-stated,  collapsed  over  plausibility. 

Insert  Table  2  about  here 


There  are  a  number  of  interesting  patterns  worth  noting.  Differences  were  plotted 
rather  than  the  raw  data  because  this  way,  it  is  easier  to  see  how  dramatic  the  effects 

are.  Figure  2  plots  the  extent  to  which  there  was  an  effect  due  to  plausibility  of  the 

question  (moderately  plausible  RT  minus  highly  plausible).  The  effect  due  to  plausibility 
of  the  question  was  much  greater  for  subjects  biased  to  use  the  plausibility  strategy. 
This  was  true  for  both  questions  that  had  been  presented  in  the  story  and  for  those  that 
had  not.  The  results  shown  in  Figure  3  are  in  stark  contrast.  Figure  3  plots  the  extent 
to  which  there  was  an  effect  due  to  whether  the  question  had  been  presented  in  the 
story  (not-stated  minus  stated).  In  this  case,  there  was  a  large  effect  for  subjects 

biased  to  use  the  direct  retrieval  strategy--just  the  opposite  of  Figure  2  which  showed 
the  plausibility  effect.  These  different  trends  for  the  two  groups  are  exactly  what  one 
would  expect  if  the  ratio  of  questions  previously  presented  to  not  previously  presented  in 
the  story  actually  did  bias  subjects'  preference  for  a  particular  strategy. 

An  ANOVA  was  performed  on  the  data  from  the  first  six  stories  to  determine 

whether  the  contrasts  mentioned  above  were  significant.  For  brevity,  the  reporting  of 
standard  results,  e  g.,  main  effects  due  to  whether  the  probe  was  stated  or  not.  or  the 
plausibility  of  the  probe,  will  be  omitted.  All  expected  replications  were  obtained.  There 
was  a  significant  interaction  on  RT  between  bias  (whether  80%  of  the  probes  were 
stated  or  only  20%  were  stated)  and  whether  or  not  the  probe  was  stated  in  the  story, 
81.57)*  16.1.  pc. 01.  This  interaction  represents  the  finding  that  the  difference  between 
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stated  and  not-stated  items  was  greater  in  the  condition  where  the  ratio  of  items  biased 
subjects  to  adopt  the  direct-retrieval  strategy.3 

The  interaction  of  plausibility  with  bias  in  strategy-selection  also  significantly 
affected  RT,  F(1,57)  =  6.2,  p< .05  for  the  first  six  stories.  such  that  the  differences 
between  highly  and  moderately  plausible  statements  was  larger  for  subjects  who  were 
biased  to  use  the  plausibility  strategy.  (There  was  no  interaction  for  the  last  four 
stories.) 

Insert  Figures  2  and  3  about  here 


There  were  also  some  interesting  results  with  respect  to  how  subjects  readjusted  to 
a  50/50  split  and  how  easily  they  altered  their  strategies.  For  example,  there  was  a 
significant  interaction  of  stated  x  bias  x  -half-  (first  6  stories  vs.  last  4),  R(1,57)  =  11.8, 
pc.01.  That  statistic  represents  the  fact  that  when  the  bias  manipulation  stopped,  the 
stated/not-stated  difference  decreased  for  subjects  originally  biased  to  use  direct  retrieval 
and  increased  for  subjects  originally  biased  to  use  plausibility  The  effect  became 
equivalent  for  the  two  groups.  Another  triple  interaction  which  illustrates  the  same  idea 
was  the  interaction  of  probe  plausibility  x  bias  x  -half".  R(1,57)  =  8.1,  p<  01.  Originally 
the  plausibility  effects  were  much  bigger  for  subjects  biased  to  use  the  plausibility 
strategy,  but  they  got  smaller  when  the  bias  disappeared.  Conversely,  the  small 
plausibility  effect  for  subjects  biased  to  use  direct  retrieval  increased  when  the  bias 
disappeared.  Although  the  plausibility  effect  appears  to  have  reversed  itself  when  the 
bias  was  removed,  the  plausibility  x  bias  interaction  was  not  significant  for  the  last  four 


stories,  p>.i0. 


DISCUSSION 


In  this  experiment,  the  official  task,  judging  plausibility,  was  identical  for  both  groups  of 
subjects,  so  all  differences  in  performance  were  due  to  the  effectiveness  of  a  particular 
strategy.  These  results  strongly  suggest  that  people  are  sensitive  to  the  parameters  of 
the  situation  in  which  they  find  themselves  and  can  rapidly  alter  the  strategy  they 
employ.  In  past  experiments,  subjects  were  shown  to  adjust  their  strategy  as  the  delay 
increased,  perhaps  because  they  knew  that  the  retrieval  strategy  would  be  less  effective 
at  a  delay.  Past  studies  also  showed  that  preference  for  a  strategy  depends  on  what 
subjects  are  actually  asked  to  do,  viz.,  make  recognition  judgments  or  make  plausibility 
judgments.  The  next  study  examines  whether  that  result  is  due  to  official  demands,  per 
se,  or  only  to  subjects'  sensitivity  to  the  probability  of  success  with  a  strategy  in  a  given 
task. 

Experiment  2:  WHAT  STRATEGY  IS  PREFERRED  WHEN  BOTH 

STRATEGIES  ALWAYS  WORK? 

It  is  conceivable  that  strategy  selection  would  not  be  affected  by  official  task 

instructions  if  either  strategy  worked  equally  well  for  the  required  task.  In  past  studies 

(e  g.,  Reder,  1982),  where  subjects  were  asked  to  make  recognition  judgments,  they 

were  more  inclined  to  use  the  direct-retrieval  strategy.  That  greater  tendency  to  select 
the  direct-retrieval  strategy  may  have  occurred  because  they  did  not  want  to  make  all 
the  errors  that  would  result  from  adopting  the  plausibility  strategy,  and  not  because  of 

the  official  task  demands.  For  this  reason,  I  conducted  a  study  where  either  strategy 

could  apply  equally  well,  and  looked  to  see  the  effects  of  official  task  demands  and 
delay  on  strategy  selection. 

In  this  study,  all  plausible  statements  were  presented  in  the  story  and  no 

implausibles  were  presented.  Therefore,  subjects  could  use  either  strategy  and  always 
be  correct  (assuming  that  they  did  not  forget  the  statement  or  make  an  erroneous 
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plausibility  judgment).  Nonetheless.  I  expected  subjects  to  show  less  use  of  the 
plausibility  strategy  when  instructed  to  make  recognition  judgments. 

METHOD 

Design  and  Materials.  The  general  design  and  materials  were  similar  to  Experiment 
1  with  several  important  differences.  Half  of  the  subjects  were  randomly  assigned  to 
make  recognition  judgments  and  the  other  half  to  make  plausibility  judgments.  Instead 
of  varying  the  proportion  of  plausible  statements  included  in  the  story,  all  plausible 
statements  to  be  judged  about  a  story  had  been  included  as  part  of  the  story.  Half  of 
the  subjects  assigned  to  each  task  were  tested  after  each  story  and  the  other  half  of 
each  task  were  tested  after  reading  all  ten  stories. 

The  stories  and  test  statements  were  the  same  as  in  Experiment  1  except  that  all 
plausible  test  items  were  presented  in  the  story.  None  of  the  implausible  statements 
were  contradictions.  In  this  way,  subjects  asked  to  make  recognition  judgments  could 
use  the  plausibility  strategy  without  making  errors  and  subjects  asked  to  make  plausibility 
judgments  could  use  the  direct-retrieval  strategy  without  making  errors. 

Procedure.  The  experiment  was  quite  similar  to  the  procedure  of  Experiment  1, 
with  only  a  few  relevant  changes.  Depending  on  condition,  subjects  were  either  told 
that  they  would  be  questioned  after  reading  each  story  or  after  reading  all  ten  stories. 
Each  story’s  test  statements  were  preceded  by  the  story’s  title  in  the  Delay  Condition. 
Subjects  assigned  to  the  Recognition  Condition  were  asked  to  decide  whether  or  not 
each  statement  was  exactly  the  same  as  one  that  they  had  read  in  the  story,  and  not 
to  respond  affirmatively  because  a  statement  seemed  true,  if  it  had  not  actually  been 
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Subjects  Sixty-two  subjects  were  recruited  from  the  department’s  summer-time 
subject  pool  and  randomly  assigned  to  conditions.  In  the  Immediate  Condition,  there 
were  17  subjects  asked  to  make  recognition  judgments  and  14  asked  to  make  plausibility 
judgments.  At  the  longer  delay  (approximately  20  minutes,  after  reading  all  ten  stories), 
there  were  16  assigned  to  the  recognition  task  and  15  to  the  plausibility  task.  Subjects 
received  $4.50  for  participating  both  in  this  30-40  minute  experiment  and  in  one  other 
that  followed. 

RESULTS  AND  DISCUSSION 

Table  3  presents  the  data  from  Experiment  2,  organized  by  delay,  task  and  plausibility  of 
the  statements.  These  data  are  the  means  of  subjects'  correct  median  response  times 
and  the  proportion  of  correct  trials  per  condition.  Analyses  of  variance  were  performed 
on  the  median  correct  response  times  and  accuracy  data  using  the  same  factors 
mentioned  above.  Analyses  were  done  using  different  contrasts  of  plausibility,  plausible 
vs.  implausible,  and  highly  plausible  vs.  moderately  plausible. 

Insert  Table  3  about  here 


Note  that  at  the  short  delay  interval,  there  was  a  115  msec,  advantage  for  the 
highly  plausible  statements  over  the  moderately  plausible  statements  in  the  plausibility 
task.  This  difference  grew  to  a  266  msec,  advantage  at  the  longer  delay,  an  increase 
of  151  msec.  When  subjects  were  asked  to  make  recognition  judgments,  initially  they 
were  actually  slightly  slower  (less  than  25  msec '  for  highly  plausible  statements  than  for 
moderately  plausible  statement but  here.  too.  the  tendency  to  adopt  the  plausibility 
strategy  increased:  At  the  delayed  test,  the  advantage  of  the  highly  plausible  grew  to  90 
msec.,  an  increase  of  nearly  115  msec  The  contrast  using  only  plausible  statements 
showed  a  marginally  significant  growth  m  the  plausibility  effect  for  both  task  types 
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R1.58)  =  2.8,  p<.  10. 

The  ANOVA  that  contrasted  highly  with  moderately  plausible  statements  produced  a 
significant  interaction  of  plausibility  and  task  on  response  times,  F(  1 .58)  =  4.0,  p<  .05, 
such  that  the  difference  between  highly  and  moderately  plausible  statements  was  bigger 
for  subjects  actually  asked  to  make  plausibility  judgments.  The  comparison  of  plausible 
statements  with  implausible  statements  also  interacted  significantly  with  task  instructions 
for  both  accuracy  and  response  times,  F(l ,58)  =  16.9  and  15.8,  respectively,  pc. 01: 
Apparently,  it  is  easier  to  say  that  an  implausible  statement  was  not  presented  (both  in 
terms  of  RTs  and  accuracy)  than  to  judge  it  as  implausible. 

These  data  make  clear  that  strategy-selection  preference  is  affected  by  official  task 
requirements  even  when  the  strategy-selection  has  no  impact  on  performance.  A 
different  way  of  putting  it  is  that  subjects  do  not  just  follow  instructions  because  they  do 
not  want  to  make  errors.  Still,  the  delay  between  study  and  test  also  seemed  to 
influence  strategy-selection.  The  present  experiment  can  not  tell  us,  however,  whether 
the  shift  in  strategy  preference  with  delay  is  due  to  a  conscious  decision,  i.e. ,  being 
aware  of  the  delay,  or  whether  it  is  due  to  an  impression  that  the  information  is  now 
less  available.4 

Experiment  3:  CAN  PEOPLE  SWITCH  STRATEGIES  FROM  QUESTION  TO 
QUESTION  BASED  ON  ADVICE  PRECEDING  EACH  ONE? 

The  previous  studies  examined  the  differences  due  to  instructions,  eg.,  asking 
subjects  to  make  recognition  or  plausibility  judgments.  These  instructions  were  given 
only  once,  prior  to  reading  the  stories.  Subjects  could  develop  a  "frame  of  mind"  and 
could  keep  the  same  bias  for  the  duration  of  the  experiment.  Experiment  1  showed  that 
subjects  can  and  do  shift  their  bias  during  an  experiment,  but  the  shift  shown  in  that 
experiment  was  not  from  question  to  question  it  is  unclear  whether  subjects  can 
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consciously  alter  strategy  selection  from  question  to  question,  at  a  moments  notice.  The 
ability  to  rapidly  switch  strategies  might  be  fairly  unconscious  and  not  something  a 
person  can  self-instruct  in  a  matter  of  seconds. 

In  this  experiment  all  subjects  were  asked  to  make  plausibility  judgments. 
However,  before  each  question  appeared  on  the  computer  screen,  the  subject  was  given 
''advice*  by  the  computer  as  to  which  strategy  was  likely  to  be  easier  to  use  to  answer 
the  next  question.  For  example,  if  the  subject  was  to  judge  the  plausibility  of  a 
statement  that  had  been  (recently)  presented,  the  computer  might  advise  that  the  subject 
"search  memory"  to  try  to  find  the  relevant  statement.  If  the  statement  had  not  been 
presented,  the  advice  might  be  to  try  to  "infer"  the  answer  rather  than  searching  for  it 
directly.  To  make  the  advice  useful,  and  in  order  to  see  whether  the  advice  was  having 
any  effect,  the  advice  was  appropriate  the  majority  of  the  time,  but  not  always.  On 
80%  of  the  trials,  the  advice  was  appropriate,  and  on  20%  of  the  trials,  the  advice  was 
inappropriate. 

METHOD 

Design  and  Materials.  The  design  consisted  of  the  factors  of  probe-plausibility 
(highly  vs.  moderately  vs.  not  plausible),  probe  presentation  in  story  (stated  vs.  not- 
stated),  and  advised  strategy  (inference  vs.  direct-retrieval  of  statement).  Whether  the 
advised  strategy  was  appropriate  or  inappropriate  depended  on  the  recommendation  and 
on  whether  the  statement  had  been  presented  in  the  story. 

The  stories  were  the  same  ten  used  in  Experiments  1  and  2.  The  questions  were 
also  the  same  as  those  in  Experiment  1  (three  of  the  six  implausibles  per  story 
contradicted  a  presented  statement).  Contradictory  statements  were  considered 
presented,  not-plausible  statements. 
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Before  each  trial  sublets  received  advice,  half  of  the  time  to  search  for  a  specific 
fact,  and  half  to  try  to  judge  whether  the  statement  was  plausible.  Half  of  the  plausible 
statements  were  presented  in  the  story  for  all  subjects  and  no  implausible  or 
contradictories  were  presented.  It  was  still  possible  to  design  the  program  to  have  the 
advice  be  correct  exactly  80%  of  the  time.  As  in  all  studies,  assignment  of  questions  to 
condition  was  randomly  determined,  as  was  order  of  presentation  of  stories,  with  one 
exception:  for  the  first  story  pair,  only  correct  advice  was  given  so  that  subjects  would 
more  rapidly  learn  to  attend  to  the  advice. 

Procedure.  The  procedure  was  very  similar  to  Experiments  1  and  2.  Questions 
were  asked  about  stories  after  every  two  stories.  The  first  of  a  pair  was  questioned, 
then  the  second.  The  relevant  part  of  the  instructions  said. 

"  Before  seeing  each  statement,  the  computer  screen  will  advise  you  to  use  a 
particular  strategy  to  judge  the  plausibility  of  the  statement.  This  advice  is  based  on 
whether  the  statement  or  its  contradiction  was  actually  presented  in  the  story.  You  will  be 
told  either  Try  to  retrieve  a  specific  fact  to  use  in  judgment.'  or  Try  to  infer  the  answer.' 
Most  of  the  time  the  advice  is  going  to  be  helpful.  That  is,  when  you  are  advised  to 
retrieve  a  fact,  either  that  fact  or  its  contradiction  was  stated  in  the  story.  Similarly,  when 
advised  to  infer,  most  of  the  time  neither  the  fact  nor  its  contradiction  was  stated. 
However,  occasionally  the  advice  will  be  wrong,  such  that  the  opposite  advice  would  have 
been  correct.  Should  the  advice  be  wrong,  still  try  to  answer  the  question  correctly." 

The  instructions  continued  with  concrete  examples,  and  advice  about  maintaining 
speed  and  accuracy  and  keeping  index  fingers  on  the  response  keys  during  the  testing 
phase.  The  advice  that  preceded  each  question  was  displayed  for  a  minimum  of  0.5 
seconds.  Then  hitting  the  space  bar  allowed  the  question  to  be  presented 
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Subjects.  Eighteen  undergraduates  enrolled  in  psychology  classes  at  C-MU 

participated  for  one  credit  toward  a  course  requirement.  One  subject’s  data  were  lost 

due  to  technical  difficulties  with  the  computer. 

RESULTS  AND  DISCUSSION 

Means  of  median  correct  response  times  and  proportion  of  correct  responses  are 

displayed  in  Table  4  as  a  function  of  advice  suggested,  plausibility  of  the  probe  and 
whether  the  probe  had  been  stated  in  the  story.  The  suitability  or  appropriateness  of 
the  advice  is  indicated  by  a  (  +  )  or  (-).  For  example,  for  the  not-stated  items,  the 

inference  advice  is  appropriate  and  the  direct  retrieval  advice  inappropriate.  Thus,  the 
line  above  the  not-stated  items  says  "!nf.(  +  )  Dir.  ret.(-)".  The  data  from  the  first  story 
pair  are  not  included  because  no  incorrect  advice  was  given  while  attempting  to  get 
subjects  to  attend  to  the  advice. 


Insert  Table  4  about  here 


An  ANOVA  was  performed  using  the  factors  of  probe  plausibility,  probe  presentation 
and  strategy  advised.  The  first  thing  to  note  is  that  subjects  were  significantly  faster  if 
the  advice  was  correct,  R(1,16)  =  5.59.  pc. 05  (interaction  of  advice  and  probe 
presentation).  That  is.  subjects  were  faster  with  the  direct -retrieval  advice  when  the 
probes  were  stated  in  the  story  and  were  faster  with  the  inference  advice  when  the 
probes  were  not  stated  in  the  story.  The  RTs  for  not-stated  probes  when  direct-retrieval 
was  advised  were  significantly  slower  than  all  others.  t( 32)  =  3.82,  pc.01,  because  this 
was  the  only  condition  where  the  advised  strategy  would  not  work.  (The  advice  to  infer 
when  the  probe  was  stated  will  work,  however,  it  is  non-optimal  since  at  such  a  short 
delay,  direct-retrieval  is  a  faster  strategy.)  For  highly  plausible  statements,  the  suitability 
of  the  advice  had  little  effect.  This  suggests  that  many  highly  plausible  statements  were 
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inferred  by  the  subject  when  they  had  not  been  explicitly  stated. 

A  second  pattern  worth  mentioning  is  the  difference  in  RT  between  the  moderately 
and  highly  plausible  statements  as  a  function  of  strategy  advised  and  whether  or  not  the 
advice  would  work.  The  interaction  of  stated  x  plausibility  (alternatively  called 
CORRECTNESS  OF  AOVICE  X  TYPE  OF  ADVICE  X  PLAUSIBILITY)  was  Significant.  F( 2. 32)  =  6.54. 
pc.01.  For  stated  probes,  when  searching  for  specific  facts  was  advised,  there  was 
less  than  a  50  msec.  difference  between  the  moderately  and  highly  plausible 
statements.  For  these  same  probes,  the  difference  was  over  300  msec,  if  inference  was 
advised.  For  not-stated  probes,  there  was  a  large  difference  due  to  plausibility 
regardless  of  strategy  advised  since  the  inference  strategy  had  to  be  executed  ultimately. 
When  direct  retrieval  was  advised,  the  moderately  plausible  were  very  slow,  since  two 
strategies  had  to  be  tried;  however,  the  highly  plausible  statements  did  not  show  this 
pattern.  This  again  suggests  that  the  highly  plausible  inferences  were  found  in  memory 
during  the  direct  retrieval  search,  (i.e.,  were  inferred  during  reading)  so  that  the  second 
strategy  did  not  have  to  be  evoked. 

Suitability  of  the  advice  did  not  have  a  reliable  effect  on  accuracy,  but  accuracy 
was  affected  by  whether  the  probe  had  been  stated  in  the  story,  R(  1,1 6)  =  7. 20,  p<  05  5 

Summary  and  Implications 

Experiments  1,  2,  and  3  showed  the  extent  to  which  strategy-selection  can  be 
influenced  by  factors  extrinsic  to  the  test  question.  Since  the  ratio  of  presented  to  not- 
presented  statements  affects  the  probability  of  success  of  the  direct-retrieval  strategy, 
subjects  in  Experiment  1  adjusted  their  use  of  strategies  accordingly.  Experiment  1  also 
showed  that  people  quickly  adapt  to  a  change  in  this  ratio  To  balance  Experiment  1, 
Experiment  2  showed  that  sublets  strategy  selection  is  affected  by  the  official  task 
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instructions  even  when  either  strategy  (direct-retrieval  or  plausibility  judgments)  will 
produce  the  correct  response.  There  were  larger  plausibility  effects  in  conditions  where 
subjects  were  actually  asked  to  make  plausibility  judgments.  The  plausibility  effects  were 
larger  at  longer  delays  for  subjects  in  both  task  conditions,  confirming  that  other 
variables  also  influence  choice. 

The  results  from  Experiment  3  give  support  to  the  idea  that  people  can  rapidly 
alter  the  strategy  they  use  to  answer  a  question.  Subjects  could  modify  which  strategy 
they  selected  from  question  to  question  on  the  basis  of  an  external  cue  such  as  "try  to 
find  the  fact  in  memory"  or  "try  to  infer  whether  this  statement  is  plausible". 
Plausibility  effects  were  small  if  the  direct  retrieval  advice  was  given  and  would  work. 
The  difference  in  RT  between  questions  that  had  been  stated  and  those  that  had  not 
been  stated  was  small  if  the  advice  was  to  use  plausibility. 

Experiment  3  clearly  indicated  that  performance  suffers  when  the  wrong  strategy  is 
selected.  Considering  the  results  of  this  experiment,  those  of  Experiments  1  ano  2,  and 
all  the  evidence  reviewed  earlier,  it  seems  clear  that  any  question-answering  model 
requires  a  preliminary  strategy  selection  phase.  Given  the  existence  of  a  strategy 
selection  stage,  it  is  interesting  to  ask  whether  factors  besides  situational  or  contextual 
variables  play  a  role  in  this  strategy  selection.  The  distinction  here  is  between  variables 
extrinsic  to  the  test  question  and  variables  intrinsic  to  the  question.  The  next  section 
addresses  this  issue. 


The  Role  of  Stimulus  Evaluation 
in  Strategy  Selection 

There  is  reason  to  believe  that,  in  addition  to  situational  or  extrinsic  variables, 
variables  intrinsic  to  the  test  question  also  play  a  role  in  strategy  selection.  Reder 
(1982).  Reder  and  Wible  (1984)  and  Experiment  2  ail  show  that  subjects  shifted  their 
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strategy  preference  with  delay.  The  explanation  for  the  strategy  shift  was  that  at  longer 
delays  information  is  less  accessible,  making  direct  retrieval  less  desirable.  It  is  possible 
that  the  decision  to  shift  strategies  was  based  on  the  subject's  knowledge  of  the  delay 
between  study  and  test;  however.  I  believe  that  subjects  shifted  strategy  by  evaluating 
the  familiarity  of  the  test  questions.  This  is  because  in  most  (non-experimental) 
situations,  people  do  not  know  in  advance  when  the  queried  information  was  learned  and 
consequently  need  to  have  some  mechanism  for  assessing  familiarity  and  choosing  ' 
strategy. 

The  position  that  sentential  or  intrinsic  variables  should  affect  strategy  selection  is 
consistent  with  other  views  of  question  answering.  A  number  of  models  have  suggested 
that  subjects  make  memory  judgments  by  a  two-stage  process  of  (1)  an  initial  memory 
evaluation,  followed  by  (2)  an  optional,  second  process  that  more  carefully  inspects 
memory.  For  example,  Atkinson  and  Juola  (1974),  in  their  analysis  of  word  recognition, 
propose  that  subjects  initially  assess  the  "familiarity"  of  an  item.  If  the  item  seems 
highly  familiar  (e  g.,  "I'm  sure  I’ve  seen  this  recently"),  they  recognize  the  item;  if  it  is 
of  low  familiarity  they  reject  it.  For  intermediate  levels  of  familiarity,  subjects  have  to 
engage  in  a  search  of  memory. 

Smith,  Shoben,  and  Rips  (1974)  proposed  a  similar  idea  for  category  judgments 
(e.g. ,  "a  chicken  is  a  bird,"  "a  canary  is  a  bird").  They  proposed  that  subjects 
evaluated  the  raw  similarity  (or  relatedness)  between  subject  and  predicate.  Again,  if 
similarity  was  high,  the  statement  was  accepted;  if  low.  it  was  rejected.  For  intermediate 
values,  a  careful  inspection  of  the  defining  features  was  required,  ignoring  overlaps  on 
"characteristic  features". 


Glucksberg  and  McCloskey  (1981)  also  have  evidence  consistent  with  a  preliminary 
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stage  that  precedes  careful  inspection,  in  this  case,  the  data  suggested  that  subjects 
make  a  quick  exit  if  the  preliminary  inspection  fails  to  find  a  connection  among  the 
concepts.  Subjects  were  much  faster  to  respond  "It  is  unknown  whether  John 
possessed  a  gun"  if  they  had  not  studied  anything  about  John  and  guns  than  if  they 
had  studied  that  exact  proposition  "It  is  unknown  whether...".  Presumably  a  first  stage 
determines  whether  there  are  connections  between  John  and  gun.  If  none,  subjects  can 
say  "unknown"  rapidly.  Otherwise  a  second  stage  carefully  inspects  the  nature  of  the 
connection. 

Some  of  the  models  that  postulate  a  preliminary-evaluation  stage  assume  that  the 
outcome  of  the  evaluation  not  only  determines  whether  or  not  to  go  on  to  do  further 
processing,  but  also  how  much  time  should  be  devoted  to  this  second  stage.  For 
example,  Lachman,  Lachman  &  Thronesberry  (1979)  postulate  metamemorial  processes 
that  regulate  how  long  search  continues  for  a  specific  piece  of  information. 

Metamemory  is  accurate  if  it  returns  correct  information  about  the  contents  in 
store.  It  is  efficient  if  it  appropriately  controls  search  durations  so  that  more 
time  is  allocated  to  seeking  information  actually  present,  less  to  information 
actually  absent,  (pp.543) 

Nelson,  Gerler  and  Narens  (1984)  have  found  a  positive  relationship  between  the  feeling 
of  knowing  and  the  amount  of  time  elapsing  before  a  memory  search  was  terminated 
during  recall. 

Despite  all  the  empirical  support  for  the  models  described  above,  it  is  still 
uncertain  whether  initial  evaluation  of  a  question  can  influence  strategy  selection.  This  is 
because  all  of  these  models  assume  that  if  processing  goes  beyond  a  preliminary 
evaluation,  there  is  only  one  possible  strategy  to  be  employed  That  strategy  is 
assumed  to  be  a  careful  inspection  of  memory,  in  contrast  to  the  more  sloppy  process 
used  for  the  preliminary  evaluation  (eg.,  Atkinson  &  Juola  1974.  Glucksberg  & 
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McCloskey.  1981;  Lachman,  Lachman  &  Thronesberry,  1979;  Smith,  Shoben  &  Rips, 
1974).  If  a  second  strategy  such  as  plausibility  or  inference  is  considered  at  all,  it  is 
assumed  to  always  follow  the  direct-retrieval  strategy  (e  g.,  Lachman  &  Lachman,  1980). 
The  initial  evaluation  oroposed  in  models  such  as  Atkinson  and  Juola  is,  in  reality,  the 
first  of  a  set  of  strategies  to  execute,  and  no  selection  is  done  at  all.  The  goal  of  the 
following  sections  is  to  give  support  to  the  hypothesis  that  initial  evaluation  of  a  question 
can  affect  strategy  selection, 

Can  We  Rapidly  Evaluate  Our  Ability  to  Answer  Questions? 

Past  work  on  "feeling  of  knowing”  has  demonstrated  that  when  people  are  unable 
to  answer  a  question,  they  are  nonetheless  able  to  accurately  assess  the  probability  that 
they  will  be  able  to  recognize  the  answer  (e.g.,  Nelson  et  al.;  Nelson  &  Narens,  1980; 
Read  &  Bruce,  1982).  Conceivably,  the  same  type  of  mechanism  that  allows  for  these 
"feeling-of -knowing"  judgments  operates  automatically  even  when  people  can  answer  the 
question.  Given  that  question-answering  requires  a  strategy-selection  stage  prior  to 
strategy  execution,  it  seems  reasonable  that  the  mechanisms  involved  in  "feeling  of 
knowing"  could  be  involved  in  assessing  which  strategy  is  preferable.  For  this  to  be  a 
viable  assumption,  the  "feeling  of  knowing"  or  initial  evaluation  would  have  to  take 
substantially  less  time  than  the  total  time  it  takes  to  answer  a  question. 

To  see  whether  our  "feeling  of  knowing"  process  could  be  involved  in  question¬ 
answering,  i.e.,  precede  a  strategy  execution  phase,  Experiment  4  asks  whether  people 
can  judge  their  ability  to  answer  a  question  faster  than  they  can  actually  answer  it.  If 
we  do  have  a  "feeling-of-knowing"  process  that  enables  us  to  select  questiun-answering 
strategies,  then  this  assessment  should  operate  faster  as  an  explicit  judgment  than  the 
task  of  actually  retrieving  the  answer  to  the  question  even  though  we  have  little  practice 
at  overtly  assessing  our  "feeling-of-knowing". 
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Experiment  4:  GAME-SHOW:  CAN  PEOPLE  ESTIMATE  ANSWERABILITY 
FASTER  THAN  THEY  CAN  ANSWER? 

To  tap  into  this  immediate,  "feehng-of-knowing "  process  and  compare  it  to  question¬ 
answering,  a  "game  show"  format  was  developed.  Subjects  see  a  question  and  rapidly 
decide  whether  they  can  answer  it.  If  they  respond  "no"  or  wait  too  long  after  the 
question  has  appeared  on  the  screen,  they  lose  the  opportunity  to  answer  the  question. 

METHOD 

Overview.  Subjects  were  asked  to  read  questions  pertaining  to  world  knowledge 
and  then,  depending  on  condition,  either  answer  the  question  aloud  or  say  whether  or 
not  they  thought  they  would  be  able  to  come  up  with  the  answer.  In  both  conditions, 
subjects  were  encouraged  to  respond  as  rapidly  as  they  could.  Those  in  the  Estimate 
Condition  who  said  "yes"  were  then  asked  to  come  up  with  the  answer  to  the  question. 

Subjects.  Thirty-one  Carnegie-Mellon  undergraduates  participated  in  the  experiment 
to  partially  fulfill  a  course  requirement.  Fifteen  were  randomly  assigned  to  the  Answer 
Condition  and  the  other  16  to  the  Estimate  Condition. 

Materials.  Questions  were  constructed  for  four  levels  of  difficulty:  20  extremely 
difficult  or  virtually  impossible-to-answer  questions  (eg..  What  is  Menachim  Begins 
favorite  dessert?);  20  difficult  (e  g.,  Who  was  the  first  man  to  climb  Mount  Everest?);  21 
questions  of  moderate  difficulty  (e  g.,  Where  did  the  Greek  gods  live?);  and  28  easy 
questions  (e  g.,  How  many  tentacles  does  an  octopus  have?).  The  questions  were  all  of 
approximately  the  same  length;  however,  the  primary  concern  was  to  compare 
performance  between  groups  that  both  saw  the  same  materials,  so  there  was  no  attempt 
to  ensure  uniformity  among  sentence  types.  Assignment  of  questions  to  difficulty  level 
was  done  on  an  intuitive  basis,  i.e..  no  norms  were  taken. 
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Procedure.  The  experiment  was  conducted  on  a  terminal  attached  to  a  PDP/li 
using  the  RSX-l  1  operating  system.  Attached  to  the  terminal  were  a  button-box.  and  a 
voice-key  with  a  microphone.  Subjects  were  instructed  about  the  task  after  their  random 
assignment  to  one  of  the  two  groups.  Those  in  the  Answer  Condition  were  told  to  read 
each  question  on  the  computer  screen  and  to  say  the  answer  into  the  microphone  "as 
quickly  as  possible,  without  sacrificing  accuracy".  The  answer  triggered  the  voice-key 
attached  to  the  comouter  and  the  response  latency  was  recorded.  After  responding, 
subjects  typed  their  verbal  response  into  the  computer.  Subjects  were  asked  to  say 
"don't  know"  as  quickly  as  possible  into  the  microphone  when  they  did  not  know  the 
answer 

Subjects  in  the  Estimate  Condition  were  instructed  to  say  "yes"  or  "no"  into  the 
microphone  as  quickly  as  possible  after  seeing  the  question.  For  "yes'  responses,  the 
subjects  were  then  asked  to  type  in  the  answer  to  the  question. 

To  motivate  fast  responding,  response  times  appeared  on  the  computer  screen  for 
a  few  seconds  after  each  trial.  Subjects  controlled  the  rate  at  which  they  saw  the 
questions  by  pushing  a  start-button  to  initiate  the  next  trial.  Subjects  were  also  alerted 
to  the  problem  of  spurious  triggerings  of  the  voice-key  due  to  random  noises  such  as 
coughs,  and  to  the  problem  of  responses  spoken  too  softly  to  activate  the  voice-key. 
On  each  trial,  there  was  the  opportunity  to  indicate  that  the  response  time  was 
inaccurate  due  to  early  or  late  triggering  of  the  voice-key.  The  experimenter  was 
present  at  all  times  to  ensure  that  subjects  typed  in  the  verbal  response  they  gave  and 
to  nullify  trials  where  the  voice-key  did  not  accurately  .ecord  RT.  The  presentation  order 
of  the  questions  was  randomly  determined  for  each  subject 


RESULTS 


A  scoring  program  scored  obviously  correct  responses.  Ail  other  answers  to  a  given 
question  were  presented  to  a  rater  to  score.  Because  the  computer  presented  all 
answers  for  a  given  question  together,  the  rater  had  no  idea  whether  a  particular  answer 
came  from  a  subject  in  the  Estimate  Condition  or  the  Answer  Condition. 

Median  response  times  were  used  to  estimate  each  subject's  performance  in  each 
condition.  For  each  type  of  question.  Table  5  gives  the  mean  time  to  give  the  answer 
(Answer  Condition)  or  say  "yes"  (Estimate  Condition),  the  proportion  of  answers 
attempted  (i.e. ,  questions  for  which  subjects  did  not  say  "don't  know*  in  the  Answer 
Condition),  the  percentage  of  questions  answered  correctly  and  the  "accuracy"  of 
question-answering.  Accuracy,  viz.  the  ratio  of  proportion  correctly  answered  over  the 
proportion  attempted,  refers  to  how  good  the  subject  was  at  estimating  what  he  or  she 
knew  (For  impossible  questions,  "can't  say"  was  considered  correct.)  Table  5  also 
gives  the  mean  RTs  for  questions  correctly  answered  and  incorrectly  answered,  and  for 
questions  for  which  subjects  said  "can't  say."  These  latter  numbers  are  not  partitioned 
by  difficulty  of  question,  to  save  space.  The  Analyses  of  variance  used  a  2  (task)  x  3 
(question-difficulty-without  impossibles)  design  for  positive  response  times,  regardless  of 
whether  or  not  they  were  correct.6 

Insert  Table  5  about  here 


The  most  important  result  to  note  is  that  subjects  asked  to  make  estimates  are 
over  25%  faster  than  those  asked  to  actually  generate  an  answer.  This  difference  is.  of 
course,  significant.  R  1.29)  =  20.0.  p<  001.  This  is  true  whether  one  looks  only  at 
positive  estimates  or  both  positive  and  negative  estimates,  so  it  can  not  be  due  to 
subjects  merely  saying  a  rapid  ’no"  to  anything  they  do  not  know  very  well  in  the 
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Estimate  Condition.  On  the  other  hand,  it  is  true  that  subjects  in  the  Estimate  Condition 
attempt  to  answer  fewer  questions  than  do  subjects  in  the  Answer  Condition;  however, 
this  difference  is  not  reliable  |F<1.0]  and  does  not  map  onto  fewer  correct  judgments  in 
the  Estimate  Condition,  The  groups  do  not  differ  in  terms  of  the  number  of  questions 
answered  correctly,  [Fcl.O],  Since  subjects  in  the  Estimate  Condition  attempt  fewer 
questions,  they  have  fewer  erroneous  attempts,  making  them  significantly  more  accurate, 
where  accuracy  is  defined  as  proportion  correct  of  those  attempted,  F(l,29)=  16.4, 
pc. 001.  Clearly  then,  the  speed  advantage  for  the  Estimate  Condition  can  not  be  due 
to  a  speed/accuracy  trade-off,  where  subjects  are  merely  stopping  the  same  process  too 
soon. 


There  were  significant  effects  due  to  difficulty  of  question,  in  terms  of  number  of 
questions  attempted,  number  answered  correctly  and  accuracy  of  attempts. 
F( 2, 58)  =  117.2,  205.4  and  24.7.  respectively,  pc. 001.  Not  surprisingly,  subjects  were 
less  inclined  to  answer  and  less  accurate  with  the  more  difficult  questions.  There  was 
also  an  interaction  of  question  difficulty  with  task,  such  that  the  accuracy  advantage  of 
the  Estimate  Group  over  the  Answer  Group  was  greater  for  more  difficult  question  types, 
F(2,58)  =  4.1,  pc.05.  Surprisingly,  there  was  no  difference  in  response  times  due  to 
question  difficulty.7  On  the  other  hand,  in  the  next  experiment,  there  is  an  effect  of 
question  difficulty  on  response  times. 

There  is  one  other  noteworthy  result.  The  data  were  analyzed  as  a  function  of 
practice,  partitioning  the  data  into  the  first  25%  of  the  experiment  and  the  last  75%. 
Sirce  the  estimation  task  is  not  one  that  most  subjects  are  used  to  performing,  it 
seemed  likely  that  any  advantage  of  that  condition  would  take  time  to  develop.  The 
advantage  of  the  Estimate  Condition  was  480  msec,  during  the  first  25%  of  the 


experiment,  but  grew  to  986  msec  in  the  last  75%. 
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DISCUSSION 

Although  the  data  support  the  hypothesis  that  there  is  a  mechanism  that  allows  people 
to  evaluate  how  much  they  know  before  they  can  actually  answer  a  question,  the  results 
are  open  to  other  interpretations:  In  the  Answer  Condition  subjects  have  to  articulate  a 
longer  response  than  subjects  in  the  Estimate  Condition.  There  is  evidence  that 
subjects  are  faster  to  initiate  an  articulation  of  a  short  word  (eg.,  "yes")  than  a  long 
word  (e.g.,  "baseball")  (e  g.,  Fowler,  1980;  Klapp,  1974;  Sternberg  &  Monsell,  1981) 
Those  effects,  however,  tend  to  be  on  the  order  of  10  to  15  msec.,  while  the  effects 
reported  here  are  on  the  order  of  800  msec. 

Given  the  sizeable  advantage  of  the  Estimate  Condition,  both  in  terms  of  response 
time  and  accuracy  of  estimation,  it  seemed  worth  demonstrating  that  the  effect  was  not 
due  to  something  as  uninteresting  as  an  advantage  for  getting  to  give  the  same  "yes" 
and  "no"  responses  on  multiple  trials.  Therefore,  in  order  to  control  for  any  advantage 
due  to  binary  responding,  per  se,  Experiment  5  required  subjects  in  both  groups  to 
make  binary  decisions  prior  to  giving  the  answer. 

Experiment  5:  BINARY  RESPONDING  FOR  ESTIMATE  VS.  ANSWER. 

This  experiment  was  similar  to  Experiment  4  with  several  notable  exceptions: 
Subjects  in  both  groups  first  pushed  one  of  two  buttons.  If  they  pushed  the  button 
indicating  that  they  had  the  answer  ready  to  give  (Answer  Condition)  or  thought  they 
could  answer  the  question  (Estimate  Condition),  then  they  went  on  and  said  the  answer 
into  a  microphone  that  recorded  their  latency  to  generate  the  answer.  Two  response 
times  were  collected  for  those  questions  that  had  an  affirmative  response,  namely  time 
for  the  positive  response  and  time  to  articulate  the  answer. 


Getting  subjects  in  the  two  conditions  to  treat  the  two  tasks  differently  was  not 
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trivial  since  the  logical  structure  of  the  tasks  was  identical.  The  Answer  Group  was 
penalized  if  they  could  not  come  up  with  the  answer  shortly  after  pressing  the  button. 
Further,  in  the  Answer  Condition,  the  question  was  removed  from  the  screen  after 
pressing  the  button  s:nce  subjects  were  already  supposed  to  have  the  answer  in  mind; 
however,  in  the  Estimate  Condition,  the  question  remained  on  the  screen  until  the 
subject  said  the  answer  into  the  microphone.  They  were  given  unlimited  time  to  give 
the  answer  after  estimating  that  it  was  answerable. 

METHOD 

Materials  and  Design.  Both  the  questions  used  and  the  design  of  the  experiment 
were  identical  to  Experiment  4.  The  only  difference  was  in  the  collection  of  two 
latencies  per  trial,  rather  than  one 

Procedure.  The  experiment  was  conducted  on  an  IBM  personal  computer,  to  which 
a  button-box,  a  microphone  and  voice-key  were  connected.  Subjects  were  told  that  the 
experiment  was  similar  to  a  television  game  show;  they  would  accumulate  points  for  their 
answers,  which  were  redeemable  for  cash  at  the  end  of  the  experiment. 

Subjects  assigned  to  the  Answer  Condition  were  instructed  to  press  the  green  . 
(right  hand)  button  as  soon  as  they  were  sure  they  had  the  answer  to  a  question  and 
were  ready  to  say  it,  but  to  press  the  red  button  (left  hand)  if  they  were  sure  they  did 
not  know  the  answer.  They  were  told  to  indicate  their  response  only  after  having 
searched  through  memory  and  either  finding  or  failing  to  find  the  answer.  Those  in  the 
Estimate  Condition  were  instructed  to  press  the  green  button  as  soon  as  they  thought 
they  probably  would  be  able  to  find  the  answer  in  memory,  and  to  press  the  red  button 
as  soon  as  they  thought  they  probably  would  not  be  able  to  find  the  answer 
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Both  groups  were  given  points  for  fast  responding  to  the  first  button  press; 
however,  in  the  Answer  Group,  subjects  had  to  speak  the  answer  into  the*  microphone 
within  one  second  of  pressing  the  button  or  else  they  lost  points.  The  points  awarded 
for  each  answer  depended  on  button-press  response  time.  Additionally,  for  affirmative 
responses,  points  depended  on  accuracy  of  answer  and  whether  or  not  the  verbal 
response  was  begun  within  the  one  second  time  limit  (in  the  Answer  Condition).  The 
scoring  method  was  explained  to  both  groups,  and  subjects  were  told  how  many  points 
they  had  accumulated  after  each  trial.  They  also  earned  points  for  negative  responses. 
Because  it  was  easier  to  amass  points  in  the  Estimate  Group,  the  conversion  of  points 
into  money  was  different  for  the  two  groups.8 

Following  an  affirmative  response,  the  computer  screen  prompted  the  subject  to 
say  the  answer  into  the  microphone  If  a  subject  in  the  Answer  Condition  failed  to  give 
the  answer  within  one  second,  the  computer  then  prompted  the  subject  to  still  attempt 
to  give  the  answer  (but  implied  that  the  response  was  late).  After  a  verbal  response 
was  given,  the  correct  answer  was  displayed  on  the  screen  and  the  experimenter  scored 
the  response.  Questions  were  presented  in  random  order. 

Subjects.  Thirty-three  undergraduates  enrolled  in  their  first  psychology  class 
participated  to  partially  fulfill  a  course  requirement.  No  subject  had  participated  in  a 
previous  version  of  this  experiment.  Sixteen  subjects  were  randomly  assigned  to  the 
Estimate  Condition  and  17  to  the  Answer  Condition.  In  addition  to  receiving  course 
credit,  they  received  nomina.  payment  for  performance  in  the  task;  2.5  mils/point  in  the 
Estimate  Group  and  3  125  mils/point  in  the  Answer  Group,  which  averaged  about  $.50  in 


bonus  payment. 
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RESULTS 

Table  6  presents  the  data  in  a  format  similar  to  Table  5.  The  estimation  or  attempt 
times  for  questions  that  were  subsequently  answered  correctly  (RT1)  are  given  for  each 
question  type  instead  of  all  showing  positive  response  times.  (The  difference  between 
the  two  measures  is  very  slight.)  The  times  are  also  given  for  the  correct  articulation  of 
the  answers,  (RT2),  and  the  sum  of  these  two  response  times  (RTl  +  RT2). 

Insert  Table  6  about  here 


The  ANOVAs  used  the  same  factors  as  Experiment  4,  and  used  correct  RTs  for 
phase  2  and  the  sum  of  these  two  times.  The  percentage  of  questions  attempted  did 
not  differ  for  the  two  instructional  groups,  although  it  did  differ  as  a  function  of  the 
difficulty  of  the  questions.  R2,62)  =  153,  pc.001.  Percent  correctly  answered  and 
accuracy  did  not  differ  across  the  two  groups,  but  again,  did  differ  with  difficulty  of 
question,  R2.62)  =  180.85  and  18.44,  respectively,  pc. 001. 

Of  greater  interest,  time  to  estimate  that  a  question  could  be  answered  was 
significantly  faster  than  time  to  indicate  an  answer  was  "in  mind",  R i.3i)  =  4.78,  pc. 05. 
Response  times  for  the  first  phase  also  differed  significantly  as  a  function  of  question 
difficulty  (unlike  Experiment  4),  R2,62)  =  4.24,  pc. 05,  such  that  for  easier  questions, 
subjects  were  faster  at  being  ready  to  give  the  answer  or  to  estimate  they  could  answer 
them. 


Time  to  generate  the  answer  also  differed  significantly  >is  a  function  of  task 
instructions.  R  1.31)  =  9. 18.  pc. 01.  but  in  the  opposite  direction.  Subjects  in  the  Answer 
Condition  were  faster  than  the  Estimate  Condition,  as  they  should  be.  to  give  the 
answer  Question  difficulty  did  not  affect  time  to  give  the  answer,  nor  was  the 
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interaction  with  task  significant.  As  the  model  would  predict,  the  sum  of  the  two 
response  times  did  not  differ  significantly,  R(1 ,31)  =  .02,  for  the  two  tasks,  even  though 
the  time  for  each  part  differed  significantly  (but  in  opposite  directions).  The  reason  the 
model  would  predict  that  the  sums  would  be  roughly  equivalent  is  that  the  estimate- 
phase  RT  should  be  a  subset  of  the  answer  task's  first  phase,  and  the  processing  that 
the  answer  task  did  during  the  first  phase,  namely  finding  the  answer  in  memory,  should 
be  included  in  the  RT  for  the  second  phase  for  the  Estimate  Group.9 

DISCUSSION 

Experiment  5  replicates  the  findings  of  Experiment  4,  that  subjects  can  estimate  that 
they  can  answer  a  question  significantly  faster  (without  sacrificing  accuracy)  than  they 
can  actually  find  the  answer.  That  is.  subjects  are  at  least  as  accurate  at  estimating 
answerability  as  they  are  at  attempting  the  answer.  It  is  unlikely  that  this  advantage  is 
due  to  something  trivial  such  as  difficulty  in  articulating  the  response  since  both  groups 
made  a  binary  decision  followed  by  the  complete  answer.  Taken  together,  these  data 
support  the  proposal  that  we  have  the  capability  to  assess  our  memories  before  we  do 
a  careful  search  of  memory. 

The  research  reviewed  at  the  beginning  of  the  paper  and  the  first  set  of 
experiments  argued  strongly  that  strategy  selection  is  part  of  question-answering.  These 
last  two  experiments  have  shown  that  a  sentence  can  receive  an  initial  evaluation  quickly 
enough  to  make  it  reasonable  that  intrinsic  variables  can  also  influence  that  selection. 
Below  l  outline  the  kinds  of  mechanisms  and  cognitive  factors  that  might  be  involved  in 
the  initial  evaluation  process 

Mechanisms  for  Evaluating  "Feeling-of-Knowing" 

The  processes  involved  in  the  initial  evaluation  of  a  question  might  include  (i) 
determining  how  recently  the  terms  in  the  question  have  been  encountered  and  (2) 
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measuring  the  extent  of  knowledge  stored  in  memory  relevant  to  the  question.  These 
processes  are  assumed  to  operate  in  the  context  of  a  semantic  network  in  which 
concepts  in  memory  can  become  "active"  from  the  terms  in  the  test  question.  Recency 
(to  be  referred  to  as  familiarity)  and  extent  of  related  information  are  measured  in  terms 
of  activation. 

Familiarity 

Although  there  have  not  been  models  concerned  with  how  feeling  of  knowing  might 
affect  strategy  selection,  there  are  theories  concerned  with  how  familiarity  is  determined. 
For  the  most  part,  these  theories  (e  g..  Jacoby  &  Dallas.  1981;  Hasher  &  Zacks,  1979; 
Hintzman,  Nozawa,  &  Irmscher,  1982;  Mandler,  1980;  Zacks,  Hasher  &  Sanft,  1982),  have 
postulated  two  separate  mechanisms  for  judging  familiarity.  Although  these  ideas  have 
been  applied  mostly  to  tasks  concerned  with  recognition  or  frequency  judgments,  they 
can  be  incorporated  into  a  framework  which  is  concerned  with  more  complicated  types 
of  memory  queries. 

Mandler  (1980)  has  argued  for  two  separate  types  of  recognition  processes,  one 
that  measures  familiarity  or  occurrence  information,  and  the  second  that  is  a  much 
slower,  more  careful  retrieval  mechanism  or  search.  He  suggests  that  the  first  type  of 
process  is  affected  by  shifts  in  modality  (e  g.,  auditory  during  study,  but  written  at  test) 
and  that  this  familiarity/occurrence  information  decays  faster  than  the 
propositional/symbolic  information;  however,  the  familiarity  process  also  can  execute  faster 
during  recognition.  This  proposal  of  an  automatic  process  that  recognizes  familiar  traces 
is  similar  to  the  proposal  of  Hasher  and  Zacks  (1979)  that  there  is  an  automatic 
mechanism  that  keeps  track  of  frequency  information.  Hasher  and  Zacks.  like  Mandler. 
also  postulate  a  more  "controlled"  (non-automatic)  memory  mechanism.  They  found  that 
many  variables  which  affect  recall  performance  do  not  affect  the  frequency  judgment 
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performance,  i.e.,  the  latter  process  does  not  degrade  with  increased  processing  loads, 
age,  etc.  Hintzman,  Nozawa,  and  Irmscher  (1982)  also  have  data  consistent  with  these 
theories.  Jacoby  and  Dallas  (1981)  make  similar  proposals  as  well;  specifically,  they 
postulate  two  memory-types,  an  autobiographical  form  of  memory  and  a  "less  aware" 
form  of  "perceptual  learning."  They  note  that  levels  of  processing  (See  Craik  &  Lockhart, 
1972)  affect  recognition  memory  but  not  perceptual  recognition.  Perceptual  recognition 
might  be  thought  of  as  a  physical  match,  and  is  like  the  mechanism  that  keeps  track  of 
frequency  information  for  Hasher  and  Zacks  and  Hintzman  et  at. .  and  is  also  like  the 
fast,  recognition-memory  mechanism  of  Mandler. 

In  the  present  framework,  determining  the  recency  of  exposure  to  a  concept  in  the 
question  is  measured  by  how  active  it  is  relative  to  its  base-activation  level.10  So,  for 
example,  if  a  story  mentioned  certain  words  often  and  some  of  those  words  were 
contained  in  a  test  statement,  the  feeling-of-knowing  mechanism  would  probably  register 
high  familiarity. 

Related  Knowledge  in  Memory 

In  addition  to  the  fast  process  of  determining  "raw  familiarity",  this  initial  evaluation  also 
measures  the  "relatedness"  of  the  concepts  in  the  question  through  the  interconnections 
in  memory.  The  proposal  that  relatedness  affects  decision  times  is  also  not  new.  For 
example,  Rips,  Shoben  and  Smith  (1973)  postulate  differences  in  feature  overlap  to 
explain  the  faster  categorization  times  for  dominant  instances.  Some  research  of  my 
own  (Reder  &  Anderson,  1980;  Reder  &  Ross,  1983)  also  suggests  that  subjects  can 
use  a  relatedness  judgment  to  by-pass  retrieval  of  a  specific  fact  from  memory. 
"Relatedness"  (for  initial  evaluation)  is  defined  as  the  degree  to  which  words  in  a 
question  cause  activation  to  intersect  in  memory  The  more  intersections  detected  in 


memory  as  a  result  of  a  query,  the  more  potentially  relevant  information  is  available  for 
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question-answering. 

These  two  feeling-of-knowing  processes,  familiarity-detection  and  intersection- 
detection.  go  on  in  parallel.  It  is  a  useful  heuristic  to  assume  that  when  familiarity  is 
high,  the  statement  was  seen  recently  and  should  be  relatively  accessible.  Direct  retrieval 
is  a  faster  and  easier  strategy  than  judging  plausibility  when  the  specific  fact  that  must 
be  found  is  relatively  accessible.  Therefore,  when  familiarity  is  high,  direct  retrieval 
should  be  the  preferred  strategy. 

When  the  process  that  detects  intersection  of  activation  determines  that  there  is  a 
lot  or  a  moderate  amount  of  potentially  relevant  information,  there  is  a  bias  to  use  the 
plausibility  strategy.  When  both  biases  exist,  then  the  bias  to  use  plausibility  is 
superceded  by  the  bias  to  use  direct-retrieval.  This  is  because  plausibility  always  has  a 
longer  computation  stage,  and  if  the  memory  search  is  relatively  easy,  plausibility  does 
not  have  the  compensating  search  time  advantage  to  make  it  the  preferred  strategy 
(Reder,  1982).  Questions  that  produce  little  activation  are  immediately  "recognized"  as 
unanswerable  (e  g.,  what  is  the  rate  of  mitosis  in  paramecia 7).11 

This  next  experiment  tests  a  few  of  the  implications  of  this  initial  evaluation  based 
on  intrinsic  features.  It  is  designed  to  see  whether  our  "feeling-of-knowing"  is  really 
based  on  things  like  familiarity  (recency  of  exposure)  and  number  of  intersections  in 
memory  even  when  it  turns  out  that  these  features  have  no  predictive  validity  as  to  the 
subject’s  knowing  the  answer. 

Experiment  6:  CAN  OUR  ESTIMATION  PROCESS  BE  SUBVERTED  BY 
SPURIOUS  FAMILIARITY? 

Like  Experiments  4  and  5.  in  this  experiment  half  of  the  subjects  were  asked  to 
estimate  the  answerability  of  questions,  and  the  other  half  to  answer  them  directly. 
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Before  subjects  began  the  question-answering  or  estimation  phase,  they  were  asked  to 
rate  the  frequency  of  occurence  of  some  terms  or  pairs  of  terms.  These  terms  were 
selected  from  a  random  third  of  the  questions  to  be  estimated  and/or  answered.  Rating 
terms  that  would  be  seen  as  part  of  a  question  was  the  "priming"  manipulation.  Half 
of  the  subjects  were  asked  to  rate  pairs  of  terms  on  conjoint  frequency  (where  both 
terms  were  taken  from  the  same  question);  the  other  half  rated  terms  in  isolation.  For 
both  types  of  rating  groups,  the  same  terms  from  a  question  were  rated  if  a  question 
was  to  be  primed.  For  this  reason,  subjects  who  rated  pairs  had  exactly  half  as  many 
rating  trials. 

In  the  past,  subjects  asked  to  estimate  whether  they  could  answer  a  question  were 
more  accurate  than  subjects  actually  asked  to  answer  them.  The  prediction  is  that  by 
priming  words  from  a  question,  subjects  would  be  "thrown  off"  in  terms  of  using  the 
mechanisms  they  normally  use  to  judge  answerability.  The  Estimate  Group  should 
estimate  that  they  can  answer  more  primed  questions  than  unprimed,  at  least  for  those 
questions  that  are  difficult,  i.e.,  that  they  would  not  otherwise  judge  as  answerable. 
Question  difficulty  was  varied  systematically  so  that  this  prediction  could  be  tested.  The 
Answer  Group  was  not  expected  to  give  more  answers  to  the  primed  questions;  however, 
they  were  expected  to  take  longer  to  say  that  they  could  not  answer  a  question  (i.e., 
search  longer  for  the  answer  before  giving  up)  if  it  had  been  primed.  Also  of  interest 
was  whether  the  effect  of  priming  differs  depending  on  whether  the  terms  were  rated 
together  or  separately. 

METHOD 

Materials.  Questions  were  selected  from  a  set  that  has  been  normed  for 
answerablility  (Nelson  &  Narens.  1980).  Three  levels  of  difficulty  were  used:  twenty-one 
questions  from  the  most  difficult  of  Nelson  and  Naren  s  question  set  were  selected. 
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(mean  recall  =  6.5%);  21  from  mid-range  (31.8%);  and  21  from  the  easiest  third 
(71,5%).  For  each  question,  two  terms  that  seemed  (a)  least  common,  and  (b)  most 
"central"  to  the  question,  were  selected  to  be  the  candidate  priming  words.  For 
example,  for  the  question  "What  term  in  golf  refers  to  a  score  of  one  under  par  on  a 
particular  hole?",  the  priming  terms  were  "golf"  and  "par".'2  Candidates  were  needed 
for  all  questions,  since  those  selected  for  priming  within  each  level  of  difficulty  were 
randomly  determined  for  each  subject.  In  addition  to  these  63  questions,  15  easy 
practice  questions  were  used. 

Design.  There  were  two  between-subject  factors:  task  group  (estimate  vs. 
answer),  and  type  of  priming  or  rating  task  (rating  individual  terms  vs.  rating  term-pairs). 
There  were  also  two  within-subject  factors:  question  difficulty  (easy  vs.  moderate  vs. 
difficult),  and  whether  the  question  was  primed  by  the  rating  task  or  not  (primed  vs. 
unprimed).  Half  as  many  questions  were  primed  as  unprimed  because  subjects  might 
have  become  suspicious  of  the  priming  manipulation  and  attempted  to  alter  their 

"feeling-of-knowing"  strategy  if  too  many  questions  were  primed.  Each  level  of  difficulty 
had  seven  primed  and  14  unprimed  questions. 

Procedure.  Subjects  were  seated  in  front  of  an  IBM  PC  and  randomly  assigned  to 
one  of  four  conditions.  (They  were  unaware  initially  as  to  whether  they  would  be  making 
estimates  or  directly  answering  questions.)  Before  the  question-phase  began,  subjects 
were  told  to  rate  the  terms  that  would  be  displayed  on  the  screen.  Subjects  were 

asked  to  rate,  on  a  five-point  scale,  how  often  a  term  was  encountered  during  reading 

or  listening,  or  how  c,*ten  the  pair  of  terms  was  encountered  together,  i.e.,  conjoint 
frequency.  The  order  of  presentation  of  the  terms  or  pairs  was  randomly  determined  for 
each  subject  with  two  constraints:  terms  from  the  same  sentence  could  not  be 

presented  sequentially  in  the  "single"  conditions  and  practice  items  always  were  rated 


Following  the  rating  task,  subjects  were  instructed  about  the  question-answering 
phase,  which  was  similar  to  Experiment  4.  The  Answer  Group  spoke  their  answers 
directly  into  the  microphone,  and  the  Estimate  Group  said  "yes"  or  "no”  orally  before 
giving  an  answer. 

Subjects.  There  were  76  subjects:  18  in  the  Estimate-single  Group,  20  in  the 
Estimate-pair  Group  and  19  in  each  Answer  Condition.  Forty-seven  subjects  were  paid 
$4.50  for  participating  in  this  and  one  other  experiment.  The  others  were  given  course 
credit.  The  paid  subjects  were  either  students  or  staff  from  Carnegie-Mellon,  while  those 
receiving  credit  were  students  participating  in  their  first  psychology  course. 

RESULTS 

Table  7  is  organized  so  that  the  data  from  subjects  in  the  Estimate  Groups  are 
presented  on  top,  and  the  data  from  subjects  in  the  Answer  Groups  are  given  in  the 
lower  panel.  The  data  are  given  on  separate  rows  for  the  three  levels  of  question 
difficulty.  Each  row  gives  the  proportion  of  questions  attempted,  the  time  to  attempt  an 
answer  (say  "yes"  in  the  Estimate  Condition),  and  the  time  to  say  "don’t  know"  (or 
"no"),  for  both  unprimed  and  primed  questions.  The  data  are  collapsed  over  the  type 
of  rating  task  (pairs  vs.  singles),  since  the  patterns  are  very  similar  for  the  two  priming 
conditions  and  that  variable  did  not  interact  with  any  other.13 


Insert  Table  7  about  here 
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estimated  by  using  the  grand  mean  of  each  group  plus  the  individual's  subject  effect, 
plus  the  relevant  condition's  effect.14 

Effect  of  Priming  on  Proportion  of  Questions  Attempted.  The  proportion  of  questions 
attempted  varied  as  a  function  of  difficulty  and  also,  difficulty  interacted  with  task 
(answer  vs.  estimate).  F(2,144)  =  3.08  and  6.01  respectively,  pc.01.  For  both  tasks  fewer 
hard  questions  were  attempted,  but  the  drop-off  from  easy  to  hard  was  more  precipitous 
in  the  estimate  task,  (replicating  past  results).  Of  more  interest,  there  was  a  significant 
interaction  of  task  with  question-difficulty  and  the  priming  variable,  F{ 2, 144)  =  4.52,  p<0l 
(one-tailed:  p=  0125,  two-tailed).  In  the  estimate  task,  subjects  estimate  6%  fewer 
primed  questions  than  unprimed  questions  in  the  easy  task,  3%  more  primed  questions 
of  the  moderately  difficult,  and  7%  more  of  the  hard  ones.  This  represents  a  significant 
interaction  of  question  difficulty  with  priming,  F(2,72)  =  3.61.  p<  05  for  the  estimate 
subjects. 

There  was  no  systematic  effect  of  priming  for  proportion  of  questions  attempted  in 
the  answer  task.  The  interaction  of  question  difficulty  with  priming  was  not  significant  in 
the  answer  task,  p>.l  0  No  interaction  was  expected  there  either,  since  people  could 
not  just  use  their  "feeling  of  knowing"  in  order  to  answer.  The  effect  of  priming  was 
not  consistent  across  the  two  rating  tasks  in  the  answer  task,  while  the  pattern  of 
priming  effects  for  levels  of  difficulty  was  the  same  for  both  rating  tasks  in  the  estimate 
task. 


Effect  of  Priming  on  Time  to  Respond.  The  tim»  to  attempt  an  answer  was.  of 
course,  longer  for  difficult  questions.  F(2,144)=  17  54.  p<  0l  There  was  also  a 
significant  interaction  of  difficulty  with  type  of  task.  R2. 144)  =  6  32.  p<  01  The  slow¬ 
down  in  RT  with  more  difficult  questions  was  greater  in  the  Answer  Condition  because 
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subjects  had  to  actually  find  the  difficult  answers,  while  in  the  Estimate  Condition,  a 
"feeling  of  knowing"  may  come  almost  as  quickly  for  difficult  questions. 

Priming  had  opposite  effects  on  the  two  tasks:  Subjects  were  over  100  msec. 
faster  to  estimate  that  they  could  answer  questions  if  the  questions  had  been  primed. 
On  the  other  hand,  subjects  were  200  msec,  slower  to  give  an  answer  if  the  question 
had  been  primed.15  Presumably  the  priming  facilitated  the  "feeling  of  knowing"  for  both 
groups.  For  the  estimate  task,  that  was  all  that  was  needed  and  a  decision  could  be 
made  sooner.  For  the  Answer  Group,  however,  this  "feeling  of  knowing"  led  them  to 
look  longer  for  answers  than  they  would  have  otherwise. 

Effects  of  Priming  on  Times  to  Not  Attempt  an  Answer  The  result  that  subjects 
were  slowed  down  by  priming  in  the  answer  task  is  also  reflected  in  the  times  for  those 
questions  that  were  not  attempted.  There  is  a  marginally  significant  interaction  of  priming 
and  task,  F(l,72)  =  3.3,  pc.  10,  reflecting  the  fact  that  subjects  were  800  msec,  slower 
to  say  they  could  not  answer  a  question  if  it  had  been  primed  but  were  35  msec,  faster 
for  primed  questions  if  they  only  had  to  estimate  that  they  could  answer  them.  There 
were  significant  interactions  of  difficulty  x  task,  F(2,144)  =  7.0,  pc. 01,  and  difficulty  x 
priming  x  task,  F(2,144)  =  3.1,  pc  .05  The  rating  task  gave  subjects  a  false  "feeling  of 
knowing"  in  the  answer  task  too.  For  the  more  difficult  primed  statements,  they 
therefore  tried  longer  to  come  up  with  an  answer  before  realizing  that  they  could  not. 

DISCUSSION 

Experiment  6  supports  the  idea  that  recent  exposure  to  concepts  affects  a  person  s 
"feeling  of  knowing."  The  results  also  support  the  idea  that  there  is  a  separate  process 
for  initial  evaluation  since  the  manipulation  of  priming  had  complimentary  effects  for 
subjects  asked  to  answer  directly  as  compared  to  those  asked  to  estimate  whether  or 
not  they  could  answer:  Proportion  attempted  was  affected  by  priming  only  in  the 


Reder 


46 


estimate  task,  since  subjects  in  the  Answer  Condition  had  to  come  up  with  the  answer: 
on  the  other  hand,  time  to  say  "don't  know"  was  only  affected  by  priming  in  the  answer 
tasks. 


One  result  in  the  data  to  be  explained  is  why  subjects  were  less  inclined  to 
estimate  that  they  knew  an  answer  to  a  question  for  primed  questions  that  were  easy  to 
answer.  One  explanation  is  that  subjects  were  aware  of  the  priming  manipulation  and 
its  distorting  influence  and  were  trying  to  counteract  it.  If  subjects  were  aware  that  they 
were  more  inclined  to  positively  estimate  when  the  items  seemed  familiar  from  rating, 
and  that  their  accuracy  suffered  as  a  result,  they  might  try  to  counter-act  the 

manipulation  If  so.  whenever  they  recognized  that  the  terms  in  the  probe  had  been 

previously  rated,  they  might  raise  their  criterion  for  " feeling  of  knowing". 

The  raising  of  the  criterion  would  not  have  the  same  effect  for  all  levels  of 

question  difficulty.  This  is  because  priming  does  not  have  the  same  effect  for  easy  as 
it  does  for  hard  questions  Hard  questions  are  influenced  much  more  by  priming,  since 
easy  ones  would  be  attempted  anyway.  This  correction  procedure  (raising  the  criterion) 
does  not  completely  counteract  the  priming  manipulation,  and  therefore,  hard  questions 
still  show  a  bias  in  feeling-of-knowing;  however,  since  priming  was  never  needed  for  easy 
questions,  this  correction  lowers  estimates  for  primed,  easy  questions. 

This  explanation  is  consistent  with  the  theory,  discussed  earlier,  that  people  are 

sensitive  to  contextual  variables  in  the  testing  situation,  e  g.,  notice  the  effects  of  the 
priming  manipulation.  We  know  that  people  are  capable  of  rapidly  altering  their  strategy. 
It  seems  that  subjects  are  trying  to  alter  the  strategy  that  they  would  otherwise  select  on 
the  basis  of  such  strategic  decisions  as  "this  probe  contains  terms  that  were  in  the 

rating  task-therefore  I  should  underestimate  how  easy  it  would  be  to  answer." 
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General  Discussion:  Putting  the  Two 
Strategy-Selection  Processes  Together 

This  paper  has  argued  that  there  exists  a  strategy  selection  stage  and  suggested 
two  types  of  processes  that  are  involved  in  the  selection:  one  type  is  strategic  processes 
that  evaluate  situational  or  contextual  information  and  the  other  type  is  (ess  conscious 
processes  that  quickly  evaluate  how  familiar  the  question  seems.  Experiments  1,2,  and 
3  showed  that  we  are  quite  sensitive  to  extrinsic  factors  when  selecting  a  strategy. 
Experiments  4  and  5  showed  that  we  "evaluate"  sentences  fast  enough  for  initial 
evaluation  of  a  question  to  be  part  of  the  strategy  selection  process.  Experiment  6 
indicated  that  recency  of  exposure  to  words  influences  our  "feeling-of-knowing"  process, 
postulated  to  be  involved  in  strategy-selection. 

Both  types  of  processes  generate  a  bias  for  the  strategy  to  be  selected.  If  the 
two  biases  are  in  conflict,  the  stronger  bias  prevails.  An  example  of  the  blending  of 
these  two  processes  comes  from  the  work  of  Gentner  and  Collins  (1981).  They  found 
that  people  are  more  likely  to  decide  that  an  assertion  is  implausible  as  the  assertion  s 
importance  and  their  own  expertise  relevant  to  the  assertion  increase  This  is  the  "I 
would  know  this  fact  if  it  were  true"  phenomenon.  If  a  person  knows  a  lot  about  an 
area  (strategic  information),  and  there  is  no  convergence  of  activation  from  the  concepts 
in  the  assertion  (initial  evaluation),  a  person  may  be  more  willing  to  make  a  quick  "no" 
response.  Alternatively,  if  there  is  some  convergence  of  activation,  then  knowing  more 
about  an  area  might  lead  one  to  spend  more  time  searching  for  the  information  (Collins, 
personal  communication). 

How  does  the  framework  relate  to  current  views  of  question-answering? 

One  of  the  major  influences  on  theories  of  question-answering  comes  from  the  Cognitive 
Science  Group  at  Yale.  (eg..  Schank.  1982:  Reiser.  Black  &  Kalamarides.  in  press: 
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Reiser,  Black  &  Abelson,  1985:  Kolodner.  1983,  1984).  Schank's  (1972)  original  view 
was  that  all  inferences  are  computed  "on-line''  during  comprehension.  In  this  way,  when 
a  question  is  asked  that  does  not  tap  information  explicitly  presented,  the  answerer  can 
look  at  the  memory  representation  of  the  input  and  directly  retrieve  the  information  since 
it  was  already  inferred  ( e.g Schank.  1972,  1975;  Schank  &  Abelson,  1977).  Lehnert  s 
(1978)  work  on  question-answering  involved  inferential  mechanisms  at  the  time  of  test: 
however,  these  were  not  to  compute  the  answer  to  the  question,  per  se.  Rather,  the 
inference  was  to  determine  the  true  intention  of  the  question-asker  (e  g  ,  some  questions 
are  really  requests,  as  in  "it  is  chilly  in  here,  don’t  you  think?"  means  "please  close  the 
window")  or  to  figure  out  what  level  of  answer  is  appropriate  (e.g.,  would  a  yes  or  no 

answer  be  enough?).  In  her  model,  the  answering  mechanism  can  still  be  thought  of  as 

direct-retrieval  once  the  question  is  properly  determined.  Graesser  (1981)  also  seems  to 
believe  that  large  quantities  of  inferences  are  constructed  during  comprehension  and  that 
these  will  be  used  for  question-answering  if  available.  In  other  words,  none  of  these 
theories  consider  that  plausible  reasoning  could  be  a  preferred  question-answering 
strategy. 

Singer  &  Ferreira  (1983)  do  not  assume  that  "all"  plausible  inferences  are  made  in 
advance.  Nonetheless,  they  too  believe  that  if  an  inference  is  made  "on-line,”  that  it 
will  be  retrieved.  Neither  a  stored  inference  nor  a  presented  statement  would  ever  be 
verified  by  inference  at  time  of  test  if  it  could  be  found  in  memory.  Test  statements 
that  were  not  inferred  during  comprehension  would  have  to  be  computed,  but  that  is  the 

less  preferred  strategy.  The  model  proposed  here  is  different  from  both  of  these  points 

of  view.  Regardless  of  whether  or  not  the  inference  was  computed  "on-line"  during 
comprehension,  there  is  a  very  strong  possibility  that  a  person  will  not  bother  to  try  to 
find  it.  Instead,  people  will  often  just  try  to  compute  or  recompute  whether  or  not  an 


assertion  seems  plausible. 


In  addition.  Kolodner  (1980)  has  looked  at  the  retrieval  from  memory  of  personal  or 
"episodic"  (Tulving,  1972)  information.  Peiser  (1983)  has  also  looked  at  retrieval  of 
personal  memories  and  has  developed  a  model  similar  in  spirit  to  Kolodner 's.  He  has 
also  gathered  empirical  support  for  his  model  (Reiser,  1983;  Reiser.  Black  &  Abelson. 
1985).  The  type  of  memory  queries  that  their  theories  address  concern  events,  e.g., 
"did  you  ever  go  swimming  in  a  river  in  Ohio?"  In  their  framework  inferential 
mechanisms  are  required  to  answer  questions;  however,  in  this  case,  the  inferencing 
must  be  done  to  extract  the  relevant  search  contexts  and  infer  whether  an  item  not 
found  during  initial  search  can  be  found  with  further  search.  Their  view  is  that  one 
memory  retrieval  can  provide  cues  for  a  subsequent  retrieval.  This  is  similar  to  ideas 
described  by  Norman  and  Bobrow  (1979),  Williams  and  Hollan  (1981),  and  Williams  and 
Santos-Williams  (1980).  In  their  research  too,  memory  retrieval  was  viewed  as  a 
recursive,  reconstructive,  problem-solving  process.  That  is,  search  is  a  cycle  of 
specification,  matching,  and  evaluation  that  continually  refines  the  descriptions  of  the 
items  sought  in  light  of  the  evaluation.  Search  for  an  item  retrieves  partial  information, 
which  is  used  to  build  a  more  complete  specification  of  the  target  to  guide  further 
searches.16 

The  successive  refinement  strategy  for  finding  information  in  memory  proposed  by 
Kolodner  and  Reiser  could  be  thought  of  as  yet  another  mechanism  that  can  be  used  to 
answer  questions.  If  this  type  of  strategy  were  selected  for  answering  the  question,  the 
answerer  would  have  to  predict  a  plausible  memory  location  for  an  event  that  might  have 
the  relevant  target  features.  When  the  features  sought  are  not  found  there,  a  new 
location  is  tried  or  an  assessment  is  made  of  the  probability  of  finding  the  required 
information.  This  type  of  strategy  was  not  appropriate  in  the  experiments  I  reported 


because  the  information  sought  was  (a)  not  autobiographical,  and  (b)  relatively  recent. 
Indeed,  Kolodner  and  Reiser  also  distinguished  "Question-Answering"  from  their  much 
harder  memory  search,  where  the  former  is  considered  approoriate  to  answering 
questions  from  stories.  For  example,  Kolodner  states  that  search  for  relevant  contexts  is 
not  needed  for  question-answering  of  stories  read  recently. 

The  framework  developed  in  this  paper  can  be  expanded  to  include  this  other  type 
of  question-answering  task.  During  the  first  stage  that  measures  "feeling  of  knowing,"  a 
decision  that  some  relevant  information  can  probably  be  found  would  bias  Stage  2 
against  using  the  direct-retrieval  strategy  because  the  terms  in  a  probe  of  this  kind 
would  not  register  as  having  been  seen  recently.  Because  the  question  deals  with  a 
specific  episode  or  event,  an  evaluation  would  be  made  that  the  "successive-refinement" 
search  strategy  would  be  needed.  Before  going  on  to  search  for  the  relevant 
information,  an  inference  would  be  made  as  to  the  appropriate  first  context  to  search. 

How  does  the  current  framework  relate  to  other  areas  of  cognitive-processing? 

One  idea  developed  in  this  paper  is  that  we  have  a  decision  mechanism  to  select 
the  dominant  strategy  to  answer  a  question.  This  idea  is  in  contrast  to  some  of  the 
extant  models  of  cognition,  e  g.,  Anderson's  (1983)  ACT*  model  where  productions  can 
not  be  given  probabilities  of  firing.  Productions  are  either  part  of  the  goal  set  or  they 
are  not.  It  is  not  obvious  how  the  production  sets  or  goals  would  be  easily  reordered 
in  dominance,  based  on  advice  given  from  trial  to  trial.  It  is  also  not  obvious  how  these 
ideas  would  be  implemented  by  theories  which  propose  that  processing  is  implemented 
in  massively  parallel  architectures  (e.g..  McClelland.  Rumelhart  &  Hinton,  (in  press): 
Fahlmann.  Hinton  &  Sejnowski.  1983).  As  yet,  these  models  lack  any  obvious 

mechanisms  for  allowing  a  person  to  prefer  one  process  to  another,  varying  from  trial  to 
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The  mechanisms  for  deciding  which  strategy  to  select,  or  how  to  allocate  attention 
among  competing  question-answering  strategies,  may  be  quite  similar  to  those 
mechanisms  that  allow  a  person  to  allocate  attention  in  a  dual-task  situation.  There  is  a 
large  body  of  literature  concerned  with  allocation  of  resources  when  attempting  to 
perform  multiple  tasks.  When  driving  a  car,  the  amount  of  resources  devoted  to  driving 
as  opposed  to  a  second  task  such  as  conversing  with  a  passenger  can  shift  as  the 
attention  demands  of  the  driving  task  change, 

Just  as  "feeling  of  knowing"  can  affect  what  strategy  is  selected  in  question¬ 
answering,  "feeling  of  competence"  could  affect  how  much  attention  is  devoted  to  one 
task  as  opposed  to  another.  The  controlled,  strategic  processes  that  are  influenced  by 
external  factors  in  question-answering  could  also  monitor  the  feedback  from  the 
environment  in  the  dual-task  setting  to  see  whether  the  tasks  are  being  performed  well 
enough  (or  one  too  well,  and  the  other  not  well  enough).  Advice  such  as  "watch  the 
road"  or  "listen  to  this  argument  for  why  my  theory  is  better"  may  cause  the  allocation 
of  resources  to  shift,  just  as  the  advice  to  "try  to  find  the  fact  in  memory"  or  "try  to 
infer  the  answer"  can  affect  question-answering  behavior. 

A  considerable  portion  of  the  research  on  attention  concerns  allocation  of  resources 
in  a  dual-task  situation,  (eg..  Navon  &  Gopher.  1979,  1980;  Kinchla,  1980;  Wickens. 
1980;  Moray  &  Fitter,  1973:  Moray,  1967;  Shaw,  1980;  Norman  &  Bobrow,  1979; 
McLeod,  1977;  Sperling  &  Melchner,  1979;  Schneider  &  Shiffrin.  1977;  LaBerge,  1975; 
Gopher  &  North.  1974,  197").  Some  of  this  research  is  concerned  with  the  issue  of 
whether  or  not  the  processes  for  dual  tasks  operate  in  parallel  or  whether  they  occur 
sequentially,  whether  the  subject  is  strategic  in  resource  allocation  between  the  two 
tasks,  and  whether  time-sharing  or  resource  allocation  abilities  are  learned  and  can  be 
improved  (e  g ..  Fisher.  I975(ab).  1977.  1980:  Schweickert.  1978,  1980  1983,  Shaw. 
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1980:  Lane,  1980:  Damos  &  Wickens,  1980). 

Work  by  Posner.  Nissen  and  Ogden  (1978)  gave  impetus  to  the  idea  that  we  can 
allocate  attention  differentially  by  using  strategic  information.  They  looked  at  how 
response  times  varied  to  the  onset  of  a  light  as  a  function  of  the  validity  of  a  directional 
cue.  Subjects  performed  faster  with  a  valid  cue  (correct  advice  as  to  which  position 
would  light-up)  than  when  not  given  a  cue.  and  performed  slower  with  an  invalid  cue. 

In  this  literature,  the  strategic  components  that  allocate  resources  are  called  "time¬ 
sharing"  skills.  Lane  (1980)  and  Damos  and  Wickens  (1980)  have  looked  at  the 
acquisition  of  time-sharing  skills.  Lane  found  that  the  difference  between  central  and 
incidental  task  performance  increases  with  age.  and  that  the  correlation  between  the  two 
tasks  decreases  with  age.  However,  uniike  the  common  assumption  that  such 
developmental  differences  are  due  to  acquired  ability  to  process  information  selectively, 
they  may  be  due  to  capacity  trade-offs,  i.e.,  devoting  most  of  the  processing  resources 
to  one  of  the  dual-tasks.  Damos  and  Wickens  found  that  timesharing  skills  are  learned 
with  practice  and  can  transfer  to  new  dual-task  combinations.  Navon  and  Gopher  (1979) 
also  found  that  with  extensive  practice,  subjects  seemed  able  to  achieve  higher  levels  of 
performance  on  both  tasks.  One  possibility  they  consider  is  that  the  processes  involved 
in  the  allocation  of  resources  get  better  with  practice. 

Extensions 

Although  the  strategies  involved  in  other  cognitive  tasks  are  undoubtedly  different 
from  those  for  question-answering,  it  is  interesting  to  conjecture  on  the  similarity  of  the 
factors  affecting  strategy  selection.  Consider,  for  example,  mathematics.  We  can  first 
assess  our  familiarity  with  a  problem.  Certain  classes  of  problems  we  recognize  that  we 
can  work  out  without  consulting  a  math  book,  while  others  require  looking  up  the  formula 
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for  solution.  Even  within  multiplication  we  may  assess  how  familiar  a  problem  is  before 
selecting  a  procedure  for  solution.  Common  but  complex  problems  such  as  12  x  12  are 
recognized  as  directly  stored,  but  16  x  12  might  be  evaluated  as  one  that  should  be 
broken  down  into  subcomponents  to  solve.  Siegler  and  Shrager  (1984)  have  evidence 
that  children  do  similar  things  for  arithmetic  although  they  apparently  always  try  direct 
retrieval  before  computing  the  answer. 

In  addition  to  familiarity  evaluation,  it  is  probably  also  the  case  that  in  many 
domains  prior  history  of  success  with  a  strategy  influences  selection  of  the  strategy. 
Again  using  mathematics  as  an  example,  consider  the  task  of  solving  integrals.  If  one 
integration  technique  has  worked  for  many  problems,  one  is  likely  to  try  it  again  unless 
the  initial  evaluation  suggests  some  other  procedure.  The  Einstellung  effect  in  solving 
water  jug  problems  (Luchins,  1942)  is  an  example  of  where  prior  history  of  success  with 
a  (solution)  strategy  adversely  affects  strategy  selection.  The  Einstellung  effect  is 
eliminated  for  approximately  50%  of  the  subjects  when  they  are  simply  instructed  "Don't 
be  blind."  This  is  understandable  since  "prior  history  of  success  with  a  strategy"  is 
part  of  a  selection  mechanism  under  conscious  control. 

For  almost  any  complex  cognitive  task  there  are  multiple  strategies  that  can  be 
used  for  solving  it.  typically  one  which  is  more  appropriate  than  another.  As  people 
become  proficient  in  performing  the  task,  they  also  become  relatively  proficient  in 
selecting  the  best  strategy  to  use  for  a  particular  instance.  This  paper  has  provided  a 
framework  for  understanding  the  role  nf  strategy  selection  in  question-answering  and  has 
suggested  what  variables  affect  the  selection  process. 
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Notes 

'For  example,  it  is  common  for  memory  theorists  to  assume  that  Statement  A  was 
inferred  during  the  reading  of  a  text  if  verification  of  it  is  faster  than  verification  of  a 
different  Statement  0,  (e.g.,  Garnham.  1982;  Singer  &  Ferreira,  1983).  Another 
interpretation,  of  course,  is  that  statement  A  is  more  plausible  than  statement  B  and 
therefore  faster  to  verify  (judge  plausible)  at  test,  regardless  of  which  statements  had 
been  inferred  earlier. 

2Just  as  a  presented  plausible  can  be  verified  by  either  strategy,  a  contradictory 
statement  can  be  rejected  by  either  strategy,  viz.,  finding  its  exact  contradiction  in 
memory  and  noting  the  discrepancy  or  by  inferring  that  it  is  implausible.  Note  that 
failing  to  find  a  fact  using  the  direct  retrieval  strategy  is  not  a  valid  basis  for  deciding 
that  a  fact  was  implausible. 

3Note  that  the  difference  in  response  times  is  not  due  to  different  responses  to 
not-stated  items;  Both  groups  are  expected  to  respond  affirmatively  to  not-stated 
plausible  probes. 

4This  issue  will  be  discussed  at  length  later  in  the  paper 

sThe  only  probe  that  had  lower  accuracy  when  "stated"  in  the  story  was  the 
contradictory  statement  when  direct-retrieval  was  advised.  Perhaps  this  results  from 
subjects  not  bothering  to  note  the  one  opposite  word  in  the  almost-verbatim  match  of 
the  memory  trace  to  the  contradictory  probe.  That  is.  the  direct-retrieval  strategy  is  a 
"literal"  match  strategy  and  sometimes  subjects  are  sloppy  and  do  not  check  every 
word. 


5ANOVAs  using  only  the  correct  answer  times  and  those  including  impossibles  look 
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very  similar  and  do  not  change  any  interpretations. 

7lt  is  unclear  why  the  differences  were  not  manifest  in  RTs.  One  explanation  is 
that  the  difference  among  levels  of  difficulty  was  really  probability  of  being  able  to 
answer  the  question,  not  difficulty  of  answering,  per  se.  Consider,  for  example,  one  of 
the  difficult  questions,  "Bowie  Kuhn  is  (was)  commissioner  of  what  sport?".  It  is 
probably  not  difficult  for  those  who  know  the  answer  (baseball).  If  a  person  knows  the 
answer,  it  can  be  retrieved  as  fast  as  many  questions  considered  much  easier,  eg., 
"How  many  eggs  are  in  a  dozen?".  The  difference  among  the  question  types  then 
would  be  seen  only  in  measures  such  as  the  percentage  of  questions  attempted. 

8The  pay-off  formula  was:  3. 8/(0. 1 5 "  rt  s-  0.4)-3.7,  rounded  to  the  nearest  half  point. 
In  the  Estimate  Group,  subjects  were  given  an  additional  3  points  for  a  correct  response 
and  lost  3  points  for  an  incorrect  response.  In  the  Answer  Group,  they  received  the 
same  pay-off  for  accurate  and  inaccurate  responses  so  long  as  they  were  given  in  less 
than  one  second  after  the  button  press;  if  it  took  longer  than  that  to  speak  the  answer, 
subjects  lost  4  points  for  a  correct  answer  and  6  for  an  incorrect. 

9The  reason  why  the  RTs  for  the  first  phase  are  so  much  longer  than  the  second 
phase  is  that  the  first  phase  RTs  include  the  reading  time  of  the  question. 

10A  common  word  like  "table"  would  have  to  be  much  more  active  than  a  rare 
word  like  "hippopotamus"  for  the  feeling-of-knowing  process  to  recognize  it  as  having 
been  seen  recently.  See,  for  example,  Just  &  Carpenter.  (1980)  or  McClelland  & 
Rumelhart,  (1981)  for  a  fuller  discussion  of  similar  assumptions. 

"Some  questions  such  as  "what  is  Dickens'  phone  number?"  may  seem  to  be 


rejected  from  this  initial  evaluation:  however,  it  is  more  likely  that  the  decision  is  made 
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to  use  a  particular  question-answering  strategy,  and  then  it  is  determined  that  the 
question  is  un-answerable. 

12The  terms  were  sometimes  longer  than  one  word,  eg.,  "Lady  Godiva",  "Howdy 
Doody". 

13Subjects  were  significantly  faster  and  more  likely  to  attempt  answers  in  the 
paired  conditions,  for  both  the  estimate  and  the  answer  task,  although  much  more  so  for 
the  estimate  task.  The  faster  judgments  did  not  interact  with  other  variables.  Although 
it  is  possible  to  construct  explanations  for  the  modest  differences  across  conditions,  it  is 
more  likely  that  any  differences  are  due  to  subject  effects,  and  dwelling  on  these  slight 
differences  would  confuse  the  picture. 

14The  means  in  the  Tables  reflect  the  obtained  scores  without  the  estimates  for 
missing  or  undefined  observations.  The  F  statistics  vary  depending  on  the  estimation 
procedure  used  for  the  missing  observations.  The  F s  in  question  are  the  response- 
times.  For  example,  if  a  subject  did  not  attempt  any  difficult  primed  questions,  then  an 
estimate  of  the  time  for  that  subject  to  attempt  a  difficult,  primed  question  was  required. 
The  F's  reported  were  the  most  conservative,  i.e.,  gave  the  smallest  F’s,  of  the  various 
estimation  procedures.  Proportion  attempted,  on  the  other  hand,  was  not  affected  since 
there  were  no  missing  observations.  Therefore,  those  statistics  are  not  subject  to 
interpretation. 

,sAlthough  this  interaction  was  not  reliable,  it  was  reliable  with  other  procedures 
used  for  estimating  missing  data. 

,6The  proposal  that  one  retrieval  can  facilitate  another  retrieval  has  also  appeared 
in  models  of  free  and  cued  recall,  e  g..  Shiffrin .  (1970).  Raaijmakers  and  Shiffrin.  (1981) 
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Table  1 

Example  Story 

We  are  out  of  touch  with  problems  which  were  central  in  the  past. 
But  this  is  not  true  everywhere. 

The  setting  is  Burma. 

The  tiger  was  a  man-eater. 

It  suffered  from  an  old  gunshot  wound. 

The  villagers  did  not  dare  work  in  the  fields. 

In  a  disorderly  meeting  they  made  a  decision. 

They  asked  a  hunter  to  help  them. 

He  came  the  following  week. 

The  tiger  killed  a  man. 

It  had  attacked  him  in  a  small  ravine. 

It  carried  the  victim  away. 

It  concealed  the  kill  under  some  vines. 

The  hunter  followed  the  tiger's  trail. 

The  traces  were  distinct. 

He  found  the  tiger  asleep. 

The  shade  was  cool  there. 

The  hunter  considered  giving  the  tiger  a  sporting  chance. 

Then  he  shot  it. 

The  tiger  died  quietly. 

The  hunter  did  not  feel  right. 

The  villagers  understood  his  feelings. 

But  their  concern  was  practical. 

The  hunter  skinned  the  tiger. 

He  left  with  the  skin. 


Statements  to  Judge’ 


Highly  Plausible: 

The  tiger  was  dangerous. 

The  villagers  were  afraid  of  the  tiger. 

The  villagers  were  happy  that  the  tiger  was  dead. 

Moderately  Plausible: 

The  hunter  was  expected  to  solve  their  problem. 

The  hunter  thought  of  the  tiger’s  situation. 

The  hunter  used  his  best  judgement. 

implausible: 

The  villagers  had  encountered  many  man-eating  tigers. 
The  hunter  was  well  paid  for  killing  the  tiger 
The  villagers  make  their  living  primarily  by  hunting. 
There  are  no  guns  in  Burma. 

The  villagers  are  Hindu  and  do  not  believe  m  killing. 


The  village  chieftain  wore  the  tiger  skin  over  his  hips. 

Contradictory: 

The  tiger  was  awake  when  the  hunter  found  him. 

The  villagers  live  in  Nepal. 

The  hunter  left  the  tiger’s  pelt  with  the  villagers. 

The  tiger  died  fighting. 

The  villagers  scorned  the  hunter's  feelings. 

The  tiger  concealed  his  victim  in  a  cave. 

•Although  6  implausible  and  6  contradictory  statements  are  listed  here,  only  6  of  the 
possible  12  were  selected  for  any  story.  This  ensured  an  equal  number  of  true  and  false 
test  statements. 
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Table  2 


Mean  Response  Times  (and  Accuracy)  to  Make  Verification  Judgments  in 

Experiment  1 


Biased:  Stories  1-6 


BIASED  FOR 

DIRECT  RETRIEVAL 

BIASED  1 

?0R  PLAUSIBILITY 

(80*  STATED) 

(80* 

NOT-STATED) 

Stated 

Not-Stated 

Stated 

Not-Stated 

Highly  2.12 

3.13 

2.56 

2.60 

(.92) 

(.88) 

(.90) 

(.90) 

Moderately  2.27 

3.13 

2.84 

3.21 

(.90) 

(.77) 

(.92) 

(.74) 

Implaus/Contra  2 . 57 

2.80 

3-11 

2.96 

(.82) 

(.84) 

(.86) 

(.89) 

Returned 

to  50%  Stated: 

Stories  7-10 

Highly 

2.15 

2.31 

2.30 

2.57 

(.95) 

(.91) 

(.98) 

(.95) 

Moderately 

2.31 

3.02 

2.33 

2.97 

(.94) 

(.73) 

(.96) 

(.82) 

Implaus/Contra 

2.31 

2.49 

2.61 

2.90 

(.86) 

(.92) 

(.89) 

(.90) 

•The  Implausible  statements  were  never  presented  in  the  story.  The 
contradictory  statements  contradicted  an  explicitly  presented  statement 
by  substituting  an  opposite  for  one  word  in  the  statement. 
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Table  3 

Mean  Response  Times  and  Accuracy  For  Judgments  in  Experiment  2 


OFFICIAL  TASK  RECOGNITION 


PLAUSIBILITY 


%  CORRECT 

RT 

%  CORRECT 

RT 

IMMEDIATE 

Med.  Staked 

90.56 

2.103 

93.00 

1.964 

High  Stated 

93-90 

2.125 

94.47 

1.849 

Implausible 

96.06 

2.178 

89.86 

2.300 

DELAYED 

Med.  Stated 

88.54 

1.979 

91.09 

2.320 

High  Stated 

90.00 

1.889 

94.89 

2.054 

Implausible 

91.24 

1.983 

88.87 

2.519 
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Table  4 


Mean  Response  Time  and  Accuracy  for  Verification  Judgments  as  a 
Function  of  Advice  Type,  Plausibility  and  Whether  the  Probe  Had  Been 

Presented  in  the  Story 


INFERENCE 

ADVICE 

DIRECT  RETRIEVAL 

ADVICE 

%  CORRECT 

RT 

%  CORRECT 

RT 

Inf.(+)  Dir.  ret.(-) 

Med,  not  stated 

74.54 

2.706 

82.35 

3.281 

High,  not  stated 

92.81 

2.268 

91.18 

2.214 

Implausible 

80.46 

2.530 

84.90 

2.738 

Dir.  ret.(+)  Inf.(-) 

Med,  stated 

92.16 

2.444 

93-46 

2.049 

High,  stated 

98.04 

2.101 

99.35 

2.001 

Contradictory 

84.12 

2.683 

76.41 

2.501 
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Table  5 


Time  to  Estimate  or  Attempt  Answers,  Proportion  of  Questions 
Attempted,  Proportion  Answered  Correctly  and  Accuracy  of  Attempts  as  a 
Function  of  Task  and  Question  Difficulty,  Experiment  4. 


Time 

to  Attempt 


Proportion 

Attempted 


Total 

Correct 


Accuracy 
of  Attempt 


Question 


Difficulty 

EST 

ANS 

EST 

ANS 

EST 

ANS 

EST 

ANS 

Easy 

1.650 

2.512 

86.02 

91.62 

80.62 

80.35 

93-60 

87.54 

Moderate 

1.828 

2. .654 

67.46 

72.29 

59.48 

54.97 

87.98 

76.08 

Hard 

1.728 

2.388 

42.38 

47.09 

34.13 

28.42 

81.44 

59.07 

Impossible 

1.524 

2.925 

7.56 

25.13 

0.00 

0.00 

0.00 

0.00 

All 

Attempted* 

1.735 

2.518 

65.29 

70.34 

58.08 

54.58 

87.67 

74.23 

ESTIMATE 

ANSWER 

RT  for  Answered  Correctly* 

1.722 

2.396 

RT  for  Answered  Incorrectly* 

2.069 

3.009 

RT  for  Not  Attempted* 

1.851 

3.197 
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Table  6 


Mean  Proportion  Attempted,  Correct,  Accuracy  and  Response  Times  to 
Attempt  Correct  Answers,  Give  Answers,  as  a  Function  of  Task  and 
Question  Difficulty  in  Experiment  5 


Question 


RT1 :  Time 
to  Attempt 


Proportion 

Attempted 


Total 

Correct 


Accuracy 
of  Attempt 


Difficulty 

EST 

ANS 

EST 

ANS 

EST 

ANS 

EST 

ANS 

Easy 

1.385 

1.580 

84.13 

85.85 

78.70 

79.89 

93-27 

95.99 

Moderate 

1.447 

1.671 

68.69 

68.94 

60.67 

59.83 

88.43 

88.40 

Hard 

1.427 

1.775 

40.47 

37.51 

32.69 

29.96 

80.98 

80.34 

Impossible 

- 

- 

6.68 

5.00 

- 

- 

- 

All 

Attempted* 

1.420 

1.675 

64.43 

64.10 

57.35 

56.56 

87.56 

88.24 

RT2 

RT1  + 

RT2 

EST 

ANS** 

EST 

ANS** 

Easy 

0.5  90 

0-304 

1.975 

1.884 

Moderate 

0.595 

0.274 

2.042 

1.945 

Hard 

0.541 

0.317 

1.968 

2.092 

Impossible 

- 

- 

- 

- 

All 

Attempted* 

0.575 

0.298 

1.995 

1.974 

-Undefined 

"These  means  do  not  include  the  impossible  items. 

**RT2  in  Answer  Condition  does  not  include  late  responses 
(less  than  1%  of  data). 
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Table  7 

Proportion  of  Questions  Attempted,  Time  to  Attempt  Answer  or  Time  to 
Say  "Don’t  Know"  as  a  Function  of  Task,  Question  Difficulty  and  Priming 


ESTIMATE 


PROP.  ATTEMPTED 

RT  TO 

ATTEMPT 

RT  TO  NOT 

ATTEMPT 

UNPRIMED 

PRIMED 

UNPRIMED 

PRIMED 

UNPRIMED 

PRIMED 

Easy 

.82 

.76 

2.41 

2.32 

3.80 

3-45 

Moderate 

.50 

.52 

2.79 

2.58 

3-05 

3-34 

Hard 

.24 

•31 

2.72 

2.70 

2.80 

2.75 

ANSWER 


PROP.  ATTEMPTED  RT  TO  ATTEMPT  RT  TO  NOT  ATTEMPT 


UNPRIMED 

PRIMED 

UNPRIMED 

PRIMED 

UNPRIMED 

PRIMED 

Easy 

.77 

.81 

3.42 

3.64 

7.56 

6.76 

Moderate 

.54 

.48 

4.58 

4.67 

6.04 

8.30 

Hard 

.39 

•  38 

5.10 

5.47 

4.59 

5.51 
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Figure 

Figure 

Figure 


Figure  Captions 

1.  Flowchart  model  of  strategy  selection  from  Reder(l982). 

2.  Difference  in  RT  between  highly  and  moderately  plausible  statements  (collapsed 
over  stated/not-stated)  as  a  function  of  strategy-bias,  when  bias  imposed  (stories  1-6) 
and  not  imposed  (stories  7-10)  in  Exp,  1. 

3.  Difference  in  RT  between  stated  and  not-stated  statements  (collapsed  over 
plausibility)  as  a  function  of  strategy-bias,  when  bias  imposed  (stories  1-6)  and  not 
imposed  (stories  7-10)  in  Exp.  1. 
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